File size: 3,532 Bytes
bf4bd7b faced46 63d35e0 bf4bd7b faced46 51fb765 faced46 bf4bd7b 545a5c6 bf4bd7b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
license: cc0-1.0
datasets:
- mah92/Khadijah-FA_EN-Public-Phone-Audio-Dataset
language:
- fa
- en
pipeline_tag: text-to-speech
---
# بسم اله الرحمن الرحیم - هست کلید در گنج حکیم
# Model Card for Khadijah(SA)
This is the first persian/english text-to-speech model using the brand new matcha TTS model.
Much faster and better than VITS.
Works best with the UNIVERSAL_V1_22050Hz hifigan vocoder.
You can test this model [here](https://huggingface.co/spaces/k2-fsa/text-to-speech) under persian+english part.
Enjoy!
## Usage with the Sherpa-onnx repo
Remember to add metadata to onnx file as in:
https://github.com/k2-fsa/icefall/blob/master/egs/ljspeech/TTS/matcha/export_onnx.py#L174
## Usage with the Matcha-TTS repo
1) In matcha/text/cleaners.py, phonemizer.backend.EspeakBackend part:
```
language="fa",
```
2) pip install piper-phonemize
3) In cleaners.py:
add below persian_cleaners_piper:
```
import piper_phonemize
def persian_cleaners_piper(text):
"""Pipeline for Persian text, including abbreviation expansion. + punctuation + stress"""
#text = convert_to_ascii(text)
text = lowercase(text)
text = expand_abbreviations(text)
phonemes = "".join(piper_phonemize.phonemize_espeak(text=text, voice="fa")[0])
phonemes = collapse_whitespace(phonemes)
# Remove unwanted symbols (e.g., '1')
unwanted_symbols = {'1', '-'} # Add any other unwanted symbols here
filtered_phonemes = "".join([char for char in phonemes if char not in unwanted_symbols])
return filtered_phonemes
```
4) In matcha/text/cleaners.py change this line to:
```
intersperse(text_to_sequence(text, ["persian_cleaners_piper"])[0], 0),
```
5) Also set cleaner in configs/data/custom.yaml:
cleaners: [persian_cleaners_piper]
6) replace symbols.py by:
```
def read_tokens():
tokens = []
with open("/home/oem/Basir/TTS/Matcha/Matcha-TTS/configs/tokens/tokens_sherpa_with_fa.txt", "r", encoding="utf-8") as f:
for line in f:
# Remove the newline character at the end
line = line.rstrip("\n")
# Split into token and number, preserving whitespace
if " " in line:
token = line[:line.index(" ")] # Extract everything before the first space
if len(token) == 0: # White-space
token = ' '
else:
token = line # If there's no space, the entire line is the token
tokens.append(token)
return tokens
symbols = read_tokens()
```
7) For possible errors, change save_figure_to_numpy to:
```
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import io
def save_figure_to_numpy(fig):
buf = io.BytesIO()
fig.savefig(buf, format='png', bbox_inches='tight', pad_inches=0)
buf.seek(0)
img = Image.open(buf)
data = np.array(img)
buf.close()
return data
```
8) After exporting to onnx, add sherpa metadata if you want to use the model with sherpa
```
python3 ./add_sherpa_metadata_to_matcha.py
```
## Training results
data:image/s3,"s3://crabby-images/fc008/fc008f5b2e85f5c7c3b5743cc145ba6eb6651849" alt="Training Results"
## Credits
Trained by Ali Mahmoudi (@mah92)
Special thanks to Masoud Azizi (@Mablue ), Amirreza Ramezani (@brightening-eyes ), and Dr. Hamid Jafari (Khaneh Noor Iranian Basir).
Special thanks to people from @ttsfarsi channel.
I should also thank you @csukuangfj from Xiaomi corporation for your helps and cares in icefall and sherpa-onnx repos.
و ما نحن بشئ الا بما رحم ربنا |