Inquiry on Upcoming Language Support and Fine-Tuning Feasibility
Hi LiveKit team! 👋
First, thank you for your incredible work on the livekit/turn-detector model and the open-source ecosystem around it. The end-of-turn detection capabilities have been a game-changer for our conversational AI projects, especially with the improved accuracy over traditional VAD methods.
I wanted to ask about your plans for expanding language support. I recall seeing a post on X.com suggesting that multilingual support is in the pipeline for the near future. Could you share any updates on this? Many communities would greatly benefit from non-English implementations, and we’re eager to know timelines or prioritized languages.
Additionally, if broader language support isn’t imminent, is it feasible to fine-tune the current model on a custom language corpus?
For instance:
Training Requirements: What dataset format/size is recommended (e.g., conversational transcripts with turn boundaries labeled)?
Annotation Guidelines: Are specific metadata or annotations (e.g., silence duration, speaker roles) needed for training?
Architecture Constraints: Does the ONNX-based inference setup 68 allow for fine-tuning, or would adjustments to the model architecture be necessary?
We’re prepared to collaborate on preparing a training set for our target language and would appreciate guidance on best practices.
Thanks again for your transparency and dedication to advancing real-time communication tools! Looking forward to your insights.