SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
•
2502.02737
•
Published
•
153
Solving everything with diffusion models!
diffusers
🧨bistandbytes
as the official backend but using others like torchao
is already very simple. enable_model_cpu_offload()