Vikhr Salt: Speech And Language Transformer

Vikhr Salt Logo

Vikhr Salt is a multimodal model based on a pre-trained large language model, extended with new audio tokens to handle both TTS (text-to-speech) and ASR (automatic speech recognition) tasks. The model incorporates two variants for encoding audio—Encodec and SpeechTokenizer—and achieves stable training by fine-tuning precision settings. This approach allows Vikhr Salt to leverage pre-existing LLM knowledge while effectively generating and understanding speech, marking a step forward in multimodal learning.

Model Authors

Ksenya Sycheva, Konstantin Korolev, Aleksandr Nikolic

Downloads last month
134
Safetensors
Model size
1.1B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for Vikhrmodels/salt-116k

Finetuned
(25)
this model
Quantizations
2 models

Space using Vikhrmodels/salt-116k 1