It's such an amazing model, but to support multiple languages...
Your models are truly amazing.
Would it be possible to replace Whisper and TTS in your models to support other languages?
I would like to contribute to this as an open-source project.
I'm a beginner, so I don't know much and wanted to ask.
I'm sorry if this is an uncomfortable question.
I wanted to join your Discord and ask, but it seems like I don't have the permissions.
You can refer this link, but if you replace the whisper / tts module, you should retrain the model with amount of data.
Thank you for letting me know.
That was helpful to me.
I think I need to change the Whisper model itself to Large v3 Turbo and try training it again.
It seems like it would be implemented as a class...
Is there any documentation or source code available for this?
If it is available, which part of the code should I look at?
@Jongsim The whisper model should be fine enough, I believe only the llm and tts need to be trained. Whisper encoder is based on whisper medium which supports 99 langs like whisper v3 turbo.
The tts model only supports Chinese and English right now, and the llm probably needs to be trained to understand more languages from whisper.