F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)
Generate Vietnamese speech from text and reference audio
Determine GPU requirements for large language models