OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published 3 days ago • 149
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 43
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 113
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper • 2410.18603 • Published Oct 24, 2024 • 32
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 43
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published Sep 26, 2024 • 47
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published Sep 26, 2024 • 34
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4, 2024 • 93
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference Paper • 2407.14057 • Published Jul 19, 2024 • 45