SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published 1 day ago β’ 37
YuE Collection YuE: Open Full-song Generation Foundation Model β’ 9 items β’ Updated 9 days ago β’ 17
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. β’ 10 items β’ Updated 8 days ago β’ 86
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 β’ 3 items β’ Updated 10 days ago β’ 320
view article Article SmolVLM Grows Smaller β Introducing the 250M & 500M Models! 15 days ago β’ 119
video-effects Collection Fine-tunes of open video generation models like CogVideoX to emulate cool video effects like "squish", "dissolve", "cakeify", etc. Pika inspired. β’ 4 items β’ Updated 9 days ago β’ 3
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper β’ 2501.13106 β’ Published 15 days ago β’ 79
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces Paper β’ 2501.12909 β’ Published 15 days ago β’ 63
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper β’ 2501.12948 β’ Published 15 days ago β’ 298
Sana Collection β‘οΈSana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer β’ 19 items β’ Updated 29 days ago β’ 87
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper β’ 2501.02976 β’ Published Jan 6 β’ 52
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 β’ 10 items β’ Updated Dec 13, 2024 β’ 50
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper β’ 2501.01427 β’ Published Jan 2 β’ 49
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper β’ 2501.00958 β’ Published Jan 1 β’ 99