File size: 8,655 Bytes
93bafd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1d118f6
 
8caf95d
1d118f6
 
 
 
8caf95d
97461be
1d118f6
 
 
 
 
 
93bafd5
1d118f6
 
 
93bafd5
1d118f6
93bafd5
1d118f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93bafd5
1d118f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93bafd5
 
 
 
 
 
1d118f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93bafd5
 
 
 
1d118f6
93bafd5
 
 
 
 
 
 
 
 
 
 
 
 
1d118f6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
---
language:
- vi
- en
pipeline_tag: text-to-speech
license: apache-2.0
tags:
- tts
- text-to-speech
- vietnamese
- speech-synthesis
- speech,
- viet-tts
- viettts
---
<!-- # VietTTS: An Open-Source Vietnamese Text to Speech -->
<p align="center">
  <img src="https://github.com/dangvansam/viet-tts/blob/main/assets/viet-tts-medium.png?raw=true" style="width: 200px">
  <h1 align="center"style="color: white; font-weight: bold; font-family:roboto"><span style="color: white; font-weight: bold; font-family:roboto">VietTTS</span>: An Open-Source Vietnamese Text to Speech</h1>
</p>
<p align="center">
  <a href="https://github.com/dangvansam/viet-tts"><img src="https://img.shields.io/github/stars/dangvansam/viet-tts?style=social"></a>
  <a href="LICENSE"><img src="https://img.shields.io/github/license/dangvansam/viet-asr"></a>
  <a href="https://huggingface.co/dangvansam/viet-tts/blob/main/README_VN.md"><img src="https://img.shields.io/badge/README-Tiếng Việt-blue"></a>
</p>

**VietTTS** is an open-source toolkit providing the community with a powerful Vietnamese TTS model, capable of natural voice synthesis and robust voice cloning. Designed for effective experimentation, **VietTTS** supports research and application in Vietnamese voice technologies.

## ⭐ Key Features
- **TTS**: Text-to-Speech generation with any voice via prompt audio
- **OpenAI-API-compatible**: Compatible with OpenAI's Text-to-Speech API format

## 🛠️ Installation

VietTTS can be installed via a Python installer (Linux only, with Windows and macOS support coming soon) or Docker.

### Python Installer (Python>=3.10)
```bash
git clone https://github.com/dangvansam/viet-tts.git
cd viet-tts

# (Optional) Install Python environment with conda, you could also use virtualenv 
conda create --name viettts python=3.10
conda activate viettts

# Install
pip install -e . && pip cache purge
```

### Docker

1. Install [Docker](https://docs.docker.com/get-docker/), [NVIDIA Driver](https://www.nvidia.com/download/index.aspx), [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html), and [CUDA](https://developer.nvidia.com/cuda-downloads).

2. Run the following commands:
```bash
git clone https://github.com/dangvansam/viet-tts.git
cd viet-tts

# Build docker images
docker compose build

# Run with docker-compose - will create server at: http://localhost:8298
docker compose up -d

# Or run with docker run - will create server at: http://localhost:8298
docker run -itd --gpu=alls -p 8298:8298 -v ./pretrained-models:/app/pretrained-models -n viet-tts-service viet-tts:latest viettts server --host 0.0.0.0 --port 8298
```

## 🚀 Usage

### Built-in Voices 🤠
You can use available voices bellow to synthesize speech.
<details>
  <summary>Expand</summary>

| ID  | Voice                  | Gender | Play Audio                                        |
|-----|-----------------------|--------|--------------------------------------------------|
| 1   | nsnd-le-chuc          | 👨     | <audio controls src="samples/nsnd-le-chuc.mp3"></audio>  |
| 2   | speechify_10          | 👩     | <audio controls src="samples/speechify_10.wav"></audio>  |
| 3   | atuan                 | 👨     | <audio controls src="samples/atuan.wav"></audio>         |
| 4   | speechify_11          | 👩     | <audio controls src="samples/speechify_11.wav"></audio>  |
| 5   | cdteam                | 👨     | <audio controls src="samples/cdteam.wav"></audio>       |
| 6   | speechify_12          | 👩     | <audio controls src="samples/speechify_12.wav"></audio>  |
| 7   | cross_lingual_prompt  | 👩     | <audio controls src="samples/cross_lingual_prompt.wav"></audio>  |
| 8   | speechify_2           | 👩     | <audio controls src="samples/speechify_2.wav"></audio>   |
| 9   | diep-chi              | 👨     | <audio controls src="samples/diep-chi.wav"></audio>      |
| 10  | speechify_3           | 👩     | <audio controls src="samples/speechify_3.wav"></audio>   |
| 11  | doremon               | 👨     | <audio controls src="samples/doremon.mp3"></audio>       |
| 12  | speechify_4           | 👩     | <audio controls src="samples/speechify_4.wav"></audio>   |
| 13  | jack-sparrow          | 👨     | <audio controls src="samples/jack-sparrow.mp3"></audio>  |
| 14  | speechify_5           | 👩     | <audio controls src="samples/speechify_5.wav"></audio>   |
| 15  | nguyen-ngoc-ngan      | 👩     | <audio controls src="samples/nguyen-ngoc-ngan.wav"></audio>  |
| 16  | speechify_6           | 👩     | <audio controls src="samples/speechify_6.wav"></audio>   |
| 17  | nu-nhe-nhang          | 👩     | <audio controls src="samples/nu-nhe-nhang.wav"></audio>  |
| 18  | speechify_7           | 👩     | <audio controls src="samples/speechify_7.wav"></audio>   |
| 19  | quynh                 | 👩     | <audio controls src="samples/quynh.wav"></audio>         |
| 20  | speechify_8           | 👩     | <audio controls src="samples/speechify_8.wav"></audio>   |
| 21  | speechify_9           | 👩     | <audio controls src="samples/speechify_9.wav"></audio>   |
| 22  | son-tung-mtp    | 👨     | <audio controls src="samples/son-tung-mtp.wav"></audio>  |
| 23  | zero_shot_prompt      | 👩     | <audio controls src="samples/zero_shot_prompt.wav"></audio>  |
| 24  | speechify_1           | 👩     | <audio controls src="samples/speechify_1.wav"></audio>   |

  <div>
  </div>
</details>

### Command Line Interface (CLI)
The VietTTS Command Line Interface (CLI) allows you to quickly generate speech directly from the terminal. Here's how to use it:
```bash
# Usage
viettts --help

# Start API Server
viettts server --host 0.0.0.0 --port 8298

# List all built-in voices
viettts show-voices

# Synthesize speech from text with built-in voices
viettts synthesis --text "Xin chào" --voice 0 --output test.wav

# Clone voice from a local audio file
viettts synthesis --text "Xin chào" --voice Download/voice.wav --output cloned.wav
```

### API Client
#### Python (OpenAI Client)
You need to set environment variables for the OpenAI Client:
```bash
# Set base_url and API key as environment variables
export OPENAI_BASE_URL=http://localhost:8298
export OPENAI_API_KEY=viet-tts # not use in current version
```
To create speech from input text:
```python
from pathlib import Path
from openai import OpenAI

client = OpenAI()

output_file_path = Path(__file__).parent / "speech.wav"

with client.audio.speech.with_streaming_response.create(
  model='tts-1',
  voice='cdteam',
  input='Xin chào Việt Nam.',
  speed=1.0,
  response_format='wav'
) as response:
  response.stream_to_file('a.wav')
```

#### CURL
```bash
# Get all built-in voices
curl --location http://0.0.0.0:8298/v1/voices

# OpenAI format (bult-in voices)
curl http://localhost:8298/v1/audio/speech \
  -H "Authorization: Bearer viet-tts" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Xin chào Việt Nam.",
    "voice": "son-tung-mtp"
  }' \
  --output speech.wav

# API with voice from local file
curl --location http://0.0.0.0:8298/v1/tts \
  --form 'text="xin chào"' \
  --form 'audio_file=@"/home/viettts/Downloads/voice.mp4"' \
  --output speech.wav
```

#### Node
```js
import fs from "fs";
import path from "path";
import OpenAI from "openai";

const openai = new OpenAI();

const speechFile = path.resolve("./speech.wav");

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: "tts-1",
    voice: "1",
    input: "Xin chào Việt Nam.",
  });
  console.log(speechFile);
  const buffer = Buffer.from(await mp3.arrayBuffer());
  await fs.promises.writeFile(speechFile, buffer);
}
main();
```

## 🙏 Acknowledgement
- 💡 Borrowed code from [Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)
- 🎙️ VAD model from [silero-vad](https://github.com/snakers4/silero-vad)
- 📝 Text normalization with [Vinorm](https://github.com/v-nhandt21/Vinorm)

## 📜 License
The **VietTTS** source code is released under the **Apache 2.0 License**. Pre-trained models and audio samples are licensed under the **CC BY-NC License**, based on an in-the-wild dataset. We apologize for any inconvenience this may cause.

## ⚠️ Disclaimer
The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.

## 💬 Contact 
- Facebook: https://fb.com/sam.rngd
- GitHub: https://github.com/dangvansam
- Email: [email protected]