OmarSamir
/

EGTTS-V0.1

Model card Files Files and versions Community

OmarSamir commited on Dec 29, 2024

Commit

42c5495

·

verified ·

1 Parent(s): 4324164

Update README.md

Files changed (1) hide show

README.md +52 -0

README.md CHANGED Viewed

@@ -1,6 +1,58 @@
 # EGTTS V0.1
 EGTTS V0.1 is a cutting-edge text-to-speech (TTS) model specifically designed for Egyptian Arabic. Built on the XTTS v2 architecture, it transforms written Egyptian Arabic text into natural-sounding speech, enabling seamless communication in various applications such as voice assistants, educational tools, and chatbots.
 ```bibtex
 @misc{omarsamir,
       author = {Omar Samir, Youssef Waleed, Youssef Tamer ,and Amir Mohamed},

 # EGTTS V0.1
 EGTTS V0.1 is a cutting-edge text-to-speech (TTS) model specifically designed for Egyptian Arabic. Built on the XTTS v2 architecture, it transforms written Egyptian Arabic text into natural-sounding speech, enabling seamless communication in various applications such as voice assistants, educational tools, and chatbots.
+## Quick Start
+### Dependencies to install
+```bash
+pip install git+https://github.com/coqui-ai/TTS
+pip install transformers
+pip install deepspeed
+```
+### Inference
+#### Load the model
+```pyhon
+import os
+import torch
+import torchaudio
+from TTS.tts.configs.xtts_config import XttsConfig
+from TTS.tts.models.xtts import Xtts
+CONFIG_FILE_PATH = 'path/to/config.json'
+VOCAB_FILE_PATH = 'path/to/vocab.json'
+MODEL_PATH = 'path/to/model'
+SPEAKER_AUDIO_PATH = 'path/to/speaker.wav'
+print("Loading model...")
+config = XttsConfig()
+config.load_json(CONFIG_FILE_PATH)
+model = Xtts.init_from_config(config)
+model.load_checkpoint(config, checkpoint_dir=MODEL_PATH, use_deepspeed=True, vocab_path=VOCAB_FILE_PATH)
+model.cuda()
+print("Computing speaker latents...")
+gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=[SPEAKER_AUDIO_PATH])
+```
+#### Run the model
+```python
+from IPython.display import Audio, display
+text = "صباح الخير"
+print("Inference...")
+out = model.inference(
+    text,
+    "ar",
+    gpt_cond_latent,
+    speaker_embedding,
+    temperature=0.75,
+)
+torchaudio.save("xtts_audio.wav", torch.tensor(out["wav"]).unsqueeze(0), 24000)
+display(Audio('/content/xtts_audio.wav', autoplay=True))
+```
 ```bibtex
 @misc{omarsamir,
       author = {Omar Samir, Youssef Waleed, Youssef Tamer ,and Amir Mohamed},