view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control 3 days ago • 69
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 21 days ago • 33
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1 • 99
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 125