Respair commited on
Commit
adb9517
·
verified ·
1 Parent(s): 06c2080

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -3
README.md CHANGED
@@ -9,10 +9,14 @@ tags:
9
 
10
  # Model Card for Model ID
11
 
 
 
 
 
12
  This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
13
  This repository is experimental, expect some bugs and some hardcoded params.
14
 
15
- The default setting is 44.1khz - 128 Mel bin. if you want to change it to 24khz, copy the config from HiFTnet (make sure to copy its pitch extractor, both the model + the checkpoint.), then change 128 to 80 in LN-384 of the models.py. then uncomment the "multiscale_subband_cfg" for the 24khz version.
16
 
17
  Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
18
 
@@ -33,9 +37,9 @@ pip install -r requirements.txt
33
 
34
  ## Training
35
  ```bash
36
- CUDA_VISIBLE_DEVICES=0,1 accelerate launch train.py --config config_v1.json --[args]
37
  ```
38
  For the F0 model training, please refer to [yl4579/PitchExtractor](https://github.com/yl4579/PitchExtractor). This repo includes a pre-trained F0 model on a Mixture of Multilingual data for the previously mentioned configuration. I'm going to quote the HiFTnet's Author: "Still, you may want to train your own F0 model for the best performance, particularly for noisy or non-speech data, as we found that F0 estimation accuracy is essential for the vocoder performance."
39
 
40
  ## Inference
41
- Please refer to the notebook [inference.ipynb](https://github.com/Respaired/HiFormer_Vocoder/blob/main/RingFormer/inference.ipynb) for details.
 
9
 
10
  # Model Card for Model ID
11
 
12
+ [HuggingFace 🤗 - Repository](https://huggingface.co/Respair/HiFormer_Vocoder)
13
+
14
+ **DDP is very un-stable, please use the single-gpu training script** - if you still want to do it, I suggest uncommenting the grad clipping lines; that should help a lot.
15
+
16
  This Vocoder, is a combination of [HiFTnet](https://github.com/yl4579/HiFTNet) and [Ringformer](https://github.com/seongho608/RingFormer). it supports Ring Attention, Conformer and Neural Source Filtering etc.
17
  This repository is experimental, expect some bugs and some hardcoded params.
18
 
19
+ The default setting is 44.1khz - 128 Mel bins. if you want to change it to 24khz, copy the config from HiFTnet (make sure to copy its pitch extractor, both the model + the checkpoint.), then change 128 to 80 in LN-384 of the models.py. then uncomment the "multiscale_subband_cfg" for the 24khz version.
20
 
21
  Huge Thanks to [Johnathan Duering](https://github.com/duerig) for his help. I mostly implemented this based on his [STTS2 Fork](https://github.com/duerig/StyleTTS2/tree/main).
22
 
 
37
 
38
  ## Training
39
  ```bash
40
+ CUDA_VISIBLE_DEVICES=0 python train_single_gpu.py --config config_v1.json --[args]
41
  ```
42
  For the F0 model training, please refer to [yl4579/PitchExtractor](https://github.com/yl4579/PitchExtractor). This repo includes a pre-trained F0 model on a Mixture of Multilingual data for the previously mentioned configuration. I'm going to quote the HiFTnet's Author: "Still, you may want to train your own F0 model for the best performance, particularly for noisy or non-speech data, as we found that F0 estimation accuracy is essential for the vocoder performance."
43
 
44
  ## Inference
45
+ Please refer to the notebook [inference.ipynb](https://github.com/Respaired/HiFormer_Vocoder/blob/main/RingFormer/inference.ipynb) for details.