Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2501.03841

Cosmos World Foundation Model Platform for Physical AI

Paper • 2501.03575 • Published Jan 7 • 68
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 41
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published Jan 7 • 53
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

Paper • 2501.04003 • Published Jan 7 • 25

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published Jan 7 • 53

Structured Agent Operations

Agents for self-driving laboratories applied to quantum computing

Paper • 2412.07978 • Published Dec 10, 2024 • 1
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges

Paper • 2412.11427 • Published Dec 16, 2024 • 1
AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions

Paper • 2411.18015 • Published Nov 27, 2024 • 1
LLM4SR: A Survey on Large Language Models for Scientific Research

Paper • 2501.04306 • Published Jan 8 • 33

Robotic 🤖🔧

OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints

Paper • 2501.03841 • Published Jan 7 • 53
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation

Paper • 2501.01895 • Published Jan 3 • 50

2025 LLM Papers on Hugging Face with Japanese Memos

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

Paper • 2501.02955 • Published Jan 6 • 40
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published 21 days ago • 81
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published 26 days ago • 24

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 37
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 26
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15, 2024 • 24
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

Paper • 2407.10973 • Published Jul 15, 2024 • 10
Cross Anything: General Quadruped Robot Navigation through Complex Terrains

Paper • 2407.16412 • Published Jul 23, 2024 • 6
RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

Paper • 2408.11048 • Published Aug 20, 2024 • 4

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Paper • 2406.02523 • Published Jun 4, 2024 • 11
UniT: Unified Tactile Representation for Robot Learning

Paper • 2408.06481 • Published Aug 12, 2024 • 10
Latent Action Pretraining from Videos

Paper • 2410.11758 • Published Oct 15, 2024 • 2
Neural Fields in Robotics: A Survey

Paper • 2410.20220 • Published Oct 26, 2024 • 4

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 22
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

Paper • 2405.10300 • Published May 16, 2024 • 28
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 130
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Paper • 2405.11143 • Published May 20, 2024 • 36

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs