1207 56 52

Quentin Gallouédec

qgallouedec

https://gallouedec.com

AI & ML interests

None yet

Recent Activity

published a model about 3 hours ago

qgallouedec/Qwen2.5-1.5B-Open-R1-GRPO

reacted to lewtun's post with 🔥 1 day ago

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open! 🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1. 🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code. 🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training. Follow along: https://github.com/huggingface/open-r1

upvoted an article 4 days ago

Open-R1: Update #1

View all activity

Organizations

qgallouedec's activity

published a model about 3 hours ago

qgallouedec/Qwen2.5-1.5B-Open-R1-GRPO

Updated about 3 hours ago

reacted to lewtun's post with 🔥 1 day ago

Post

9883

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1

5 replies

upvoted an article 4 days ago

Article

Open-R1: Update #1

and 7 others •

5 days ago

• 237

upvoted an article 6 days ago

Article

Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

•

6 days ago

• 29

updated a dataset 7 days ago

qgallouedec/trl-metrics

Viewer • Updated 7 days ago • 76.7k • 327 • 1

updated a dataset 9 days ago

trl-lib/documentation-images

Viewer • Updated 9 days ago • 1 • 60.4k

replied to merve's post 13 days ago

reacted to merve's post with 🔥 13 days ago

Post

4989

Oof, what a week! 🥵 So many things have happened, let's recap! merve/jan-24-releases-6793d610774073328eac67a9

Multimodal 💬
- We have released SmolVLM -- tiniest VLMs that come in 256M and 500M, with it's retrieval models ColSmol for multimodal RAG 💗
- UI-TARS are new models by ByteDance to unlock agentic GUI control 🤯 in 2B, 7B and 72B
- Alibaba DAMO lab released VideoLlama3, new video LMs that come in 2B and 7B
- MiniMaxAI released Minimax-VL-01, where decoder is based on MiniMax-Text-01 456B MoE model with long context
- Dataset: Yale released a new benchmark called MMVU
- Dataset: CAIS released Humanity's Last Exam (HLE) a new challenging MM benchmark

LLMs 📖
- DeepSeek-R1 & DeepSeek-R1-Zero: gigantic 660B reasoning models by DeepSeek, and six distilled dense models, on par with o1 with MIT license! 🤯
- Qwen2.5-Math-PRM: new math models by Qwen in 7B and 72B
- NVIDIA released AceMath and AceInstruct, new family of models and their datasets (SFT and reward ones too!)

Audio 🗣️
- Llasa is a new speech synthesis model based on Llama that comes in 1B,3B, and 8B
- TangoFlux is a new audio generation model trained from scratch and aligned with CRPO

Image/Video/3D Generation ⏯️
- Flex.1-alpha is a new 8B pre-trained diffusion model by ostris similar to Flux
- tencent released Hunyuan3D-2, new 3D asset generation from images