Submitted by akhaliq 90 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training · 6 authors 2
Submitted by yilunzhao 81 MMVU: Measuring Expert-Level Multi-Discipline Video Understanding · 19 authors 2
Submitted by QwQZh 63 Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models · 10 authors 2
Submitted by akhaliq 48 UI-TARS: Pioneering Automated GUI Interaction with Native Agents · 35 authors 5
Submitted by avoin 46 TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space · 9 authors 2
Submitted by myownskyW7 39 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model · 13 authors 3
Submitted by akhaliq 33 Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation · 71 authors 4
Submitted by xhyandwyy 27 Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks · 8 authors 2
Submitted by akhaliq 23 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments · 6 authors 2
Submitted by akhaliq 22 Video Depth Anything: Consistent Depth Estimation for Super-Long Videos · 7 authors 2
Submitted by akhaliq 20 Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise · 13 authors 3
Submitted by zsytony 14 Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement · 8 authors 2
Submitted by lucaskingjade 12 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation · 5 authors 4
Submitted by RunpeiDong 10 Taming Teacher Forcing for Masked Autoregressive Video Generation · 11 authors 2
Submitted by vkarthik095 8 The Geometry of Tokens in Internal Representations of Large Language Models · 5 authors 2
Submitted by THEATLAS 6 Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation · 5 authors 2
Submitted by hasanar1f 4 Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model · 5 authors 2