new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Jan 22

Submitted by

akhaliq

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

·
6 authors

Submitted by

yilunzhao

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

·
19 authors

Submitted by

QwQZh

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

·
10 authors

Submitted by

akhaliq

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

·
35 authors

Submitted by

avoin

TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

·
9 authors

Submitted by

myownskyW7

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

·
13 authors

Submitted by

akhaliq

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

·
71 authors

Submitted by

akhaliq

Reasoning Language Models: A Blueprint

·
18 authors

Submitted by

xhyandwyy

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

·
8 authors

Submitted by

akhaliq

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

·
6 authors

Submitted by

akhaliq

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

·
7 authors

Submitted by

akhaliq

Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

·
13 authors

Submitted by

zsytony

Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement

·
8 authors

Submitted by

lucaskingjade

EMO2: End-Effector Guided Audio-Driven Avatar Video Generation

·
5 authors

Submitted by

chfeng

GPS as a Control Signal for Image Generation

·
5 authors

Submitted by

RunpeiDong

Taming Teacher Forcing for Masked Autoregressive Video Generation

·
11 authors

Submitted by

vkarthik095

The Geometry of Tokens in Internal Representations of Large Language Models

·
5 authors

Submitted by

felfri

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

·
22 authors

Submitted by

THEATLAS

Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation

·
5 authors

Submitted by

hasanar1f

Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model

·
5 authors