SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 1 day ago • 37
view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 564
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference 22 days ago • 63
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models Paper • 2411.19477 • Published Nov 29, 2024 • 6
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 79
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models Paper • 1610.02424 • Published Oct 7, 2016 • 1
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 126
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 132
Solving math word problems with process- and outcome-based feedback Paper • 2211.14275 • Published Nov 25, 2022 • 8
Self-Consistency Improves Chain of Thought Reasoning in Language Models Paper • 2203.11171 • Published Mar 21, 2022 • 3
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated about 2 hours ago • 213