199 124 494

Nishith Jain

KingNish

AI & ML interests

AI is fun actually. Busy till June 2025.

Recent Activity

new activity about 7 hours ago

KingNish/Doc-Reader-and-Chat:Upgrade gradio version

liked a model about 11 hours ago

tencent/HunyuanVideo

liked a model 1 day ago

Zyphra/Zonos-v0.1-transformer

View all activity

Organizations

KingNish's activity

New activity in KingNish/Doc-Reader-and-Chat about 7 hours ago

Upgrade gradio version

#1 opened 3 months ago by

Csplk

liked a model about 11 hours ago

tencent/HunyuanVideo

Text-to-Video • Updated 21 days ago • 7.14k • • 1.6k

liked 2 models 1 day ago

Zyphra/Zonos-v0.1-transformer

Updated 1 day ago • 2.88k • 152

Zyphra/Zonos-v0.1-hybrid

Text-to-Speech • Updated about 24 hours ago • 874 • 497

reacted to Kseniase's post with 🔥 1 day ago

Post

6519

8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhances novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG -> CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots

1 reply

upvoted a paper 1 day ago

Ultra-Sparse Memory Network

Paper • 2411.12364 • Published Nov 19, 2024 • 22

liked a model 3 days ago

onnx-community/Kokoro-82M-ONNX

Text-to-Speech • Updated 4 days ago • 21.3k • 118

liked a Space 3 days ago

R1-distilled leaderboard

⚡

Display leaderboard for open-r1 models

liked a Space 4 days ago

155

Kokoro Text-to-Speech (WebGPU)

🗣

High-quality speech synthesis powered by Kokoro TTS

updated a Space 4 days ago

YuE

👩

OpenSource Music Generator

liked a model 5 days ago

black-forest-labs/FLUX.1-schnell-onnx

Text-to-Image • Updated 11 days ago • 13

reacted to retronic's post with 🔥 5 days ago

Post

4253

Colox, a reasoning AI model. I am currently working on a model smarter than GPT o1 that thinks before it speaks. It is coming tomorrow in the afternoon.

7 replies

updated a Space 5 days ago

README

🚀

upvoted a paper 5 days ago

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published 6 days ago • 44

reacted to hexgrad's post with 👍 5 days ago

Post

5477

I wrote an article about G2P: https://hf.co/blog/hexgrad/g2p

G2P is an underrated piece of small TTS models, like offensive linemen who do a bunch of work and get no credit.

Instead of relying on explicit G2P, larger speech models implicitly learn this task by eating many thousands of hours of audio data. They often use a 500M+ parameter LLM at the front to predict latent audio tokens over a learned codebook, then decode these tokens into audio.

Kokoro instead relies on G2P preprocessing, is 82M parameters, and thus needs less audio to learn. Because of this, we can cherrypick high fidelity audio for training data, and deliver solid speech for those voices. In turn, this excellent audio quality & lack of background noise helps explain why Kokoro is very competitive in single-voice TTS Arenas.