-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 68 -
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Paper • 2501.00599 • Published • 41 -
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints
Paper • 2501.03841 • Published • 53 -
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Paper • 2501.04003 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2501.03841
-
Agents for self-driving laboratories applied to quantum computing
Paper • 2412.07978 • Published • 1 -
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges
Paper • 2412.11427 • Published • 1 -
AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions
Paper • 2411.18015 • Published • 1 -
LLM4SR: A Survey on Large Language Models for Scientific Research
Paper • 2501.04306 • Published • 33
-
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Paper • 2501.02955 • Published • 40 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 99 -
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Paper • 2501.12380 • Published • 81 -
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos
Paper • 2501.09781 • Published • 24
-
OpenVLA: An Open-Source Vision-Language-Action Model
Paper • 2406.09246 • Published • 37 -
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Paper • 2411.19650 • Published -
Octo: An Open-Source Generalist Robot Policy
Paper • 2405.12213 • Published • 26 -
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Paper • 2412.03293 • Published
-
GRUtopia: Dream General Robots in a City at Scale
Paper • 2407.10943 • Published • 24 -
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion
Paper • 2407.10973 • Published • 10 -
Cross Anything: General Quadruped Robot Navigation through Complex Terrains
Paper • 2407.16412 • Published • 6 -
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands
Paper • 2408.11048 • Published • 4
-
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Paper • 2406.02523 • Published • 11 -
UniT: Unified Tactile Representation for Robot Learning
Paper • 2408.06481 • Published • 10 -
Latent Action Pretraining from Videos
Paper • 2410.11758 • Published • 2 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 4
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 22 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 28 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 130 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 36