File size: 3,532 Bytes
bf4bd7b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
faced46
 
63d35e0
 
bf4bd7b
 
faced46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51fb765
 
 
 
 
faced46
bf4bd7b
 
 
 
 
 
 
 
545a5c6
 
 
 
bf4bd7b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: cc0-1.0
datasets:
- mah92/Khadijah-FA_EN-Public-Phone-Audio-Dataset
language:
- fa
- en
pipeline_tag: text-to-speech
---
# بسم اله الرحمن الرحیم - هست کلید در گنج حکیم
# Model Card for Khadijah(SA)

This is the first persian/english text-to-speech model using the brand new matcha TTS model.

Much faster and better than VITS.

Works best with the UNIVERSAL_V1_22050Hz hifigan vocoder.

You can test this model [here](https://huggingface.co/spaces/k2-fsa/text-to-speech) under persian+english part.

Enjoy!

## Usage with the Sherpa-onnx repo

Remember to add metadata to onnx file as in:
https://github.com/k2-fsa/icefall/blob/master/egs/ljspeech/TTS/matcha/export_onnx.py#L174

## Usage with the Matcha-TTS repo
1) In matcha/text/cleaners.py, phonemizer.backend.EspeakBackend part:
```
    language="fa",
```

2) pip install piper-phonemize

3) In cleaners.py:

add below persian_cleaners_piper:
```
import piper_phonemize
def persian_cleaners_piper(text):
    """Pipeline for Persian text, including abbreviation expansion. + punctuation + stress"""
    #text = convert_to_ascii(text)
    text = lowercase(text)
    text = expand_abbreviations(text)
    phonemes = "".join(piper_phonemize.phonemize_espeak(text=text, voice="fa")[0])
    phonemes = collapse_whitespace(phonemes)
    
    # Remove unwanted symbols (e.g., '1')
    unwanted_symbols = {'1', '-'}  # Add any other unwanted symbols here
    filtered_phonemes = "".join([char for char in phonemes if char not in unwanted_symbols])
    
    return filtered_phonemes
```

4) In matcha/text/cleaners.py change this line to:
```
    intersperse(text_to_sequence(text, ["persian_cleaners_piper"])[0], 0),
```

5) Also set cleaner in configs/data/custom.yaml:
cleaners: [persian_cleaners_piper]

6) replace symbols.py by:
```
def read_tokens():
    tokens = []
    with open("/home/oem/Basir/TTS/Matcha/Matcha-TTS/configs/tokens/tokens_sherpa_with_fa.txt", "r", encoding="utf-8") as f:
        for line in f:
            # Remove the newline character at the end
            line = line.rstrip("\n")
            # Split into token and number, preserving whitespace
            if " " in line:
                token = line[:line.index(" ")]  # Extract everything before the first space
                if len(token) == 0: # White-space
                    token = ' '
            else:
                token = line  # If there's no space, the entire line is the token
            tokens.append(token)
    return tokens

symbols = read_tokens()
```
7) For possible errors, change save_figure_to_numpy to:
```
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import io

def save_figure_to_numpy(fig):
    buf = io.BytesIO()
    fig.savefig(buf, format='png', bbox_inches='tight', pad_inches=0)
    buf.seek(0)
    img = Image.open(buf)
    data = np.array(img)
    buf.close()
    
    return data
```

8) After exporting to onnx, add sherpa metadata if you want to use the model with sherpa
```
python3 ./add_sherpa_metadata_to_matcha.py
```

## Training results
![Training Results](khadijah-22050.png)

## Credits

Trained by Ali Mahmoudi (@mah92)

Special thanks to Masoud Azizi (@Mablue ), Amirreza Ramezani (@brightening-eyes ), and Dr. Hamid Jafari (Khaneh Noor Iranian Basir).

Special thanks to people from @ttsfarsi channel. 

I should also thank you @csukuangfj from Xiaomi corporation for your helps and cares in icefall and sherpa-onnx repos.

و ما نحن بشئ الا بما رحم ربنا