RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.
3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342) Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.
Colox, a reasoning AI model. I am currently working on a model smarter than GPT o1 that thinks before it speaks. It is coming tomorrow in the afternoon.
G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.
Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.
Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.