SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 2 days ago • 61
ACECODER: Acing Coder RL via Automated Test-Case Synthesis Paper • 2502.01718 • Published 3 days ago • 22
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published 3 days ago • 149
TinySwallow Collection Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" • 5 items • Updated 8 days ago • 12
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models Paper • 2501.16937 • Published 9 days ago • 4
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 11 days ago • 23
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 3 items • Updated 10 days ago • 322
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 2 items • Updated 11 days ago • 97