F5 TTS โ€” MLX

F5 TTS for the MLX framework.

This model is reshaped for MLX from the original weights and is designed for use with f5-tts-mlx

F5 TTS is a non-autoregressive, zero-shot text-to-speech system using a flow-matching mel spectrogram generator with a diffusion transformer (DiT).

You can listen to a sample here that was generated in ~11 seconds on an M3 Max MacBook Pro.

See F5-TTS for the original checkpoint.

Installation

pip install f5-tts-mlx

Usage

python -m f5_tts_mlx.generate --text "The quick brown fox jumped over the lazy dog."

If you want to use your own reference audio sample, make sure it's a mono, 24kHz wav file of around 5-10 seconds:

python -m f5_tts_mlx.generate \
--text "The quick brown fox jumped over the lazy dog."
--ref-audio /path/to/audio.wav
--ref-text "This is the caption for the reference audio."

You can convert an audio file to the correct format with ffmpeg like this:

ffmpeg -i /path/to/audio.wav -ac 1 -ar 24000 -sample_fmt s16 -t 10 /path/to/output_audio.wav

See here for more options to customize generation.

โ€”

You can load a pretrained model from Python like this:

from f5_tts_mlx.generate import generate

audio = generate(text = "Hello world.", ...)
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
99.4M params
Tensor type
BF16
ยท
F32
ยท
U32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.