EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper • 2501.13452 • Published 14 days ago • 7
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published 16 days ago • 48
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published 17 days ago • 90
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published 21 days ago • 24
UniFL: Improve Stable Diffusion via Unified Feedback Learning Paper • 2404.05595 • Published Apr 8, 2024 • 24
Diffusion Adversarial Post-Training for One-Step Video Generation Paper • 2501.08316 • Published 23 days ago • 32
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Paper • 2501.02506 • Published Jan 5 • 11