Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published 3 days ago • 34
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • 6 days ago • 29
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 24 days ago • 89
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Paper • 2407.21787 • Published Jul 31, 2024 • 12
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 41
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 47
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8, 2024 • 39
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 106
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 566
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 136
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 506
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models Paper • 2408.02442 • Published Aug 5, 2024 • 21
Generative Verifiers: Reward Modeling as Next-Token Prediction Paper • 2408.15240 • Published Aug 27, 2024 • 13