Submitted by akhaliq 66 Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing · 7 authors 5
Submitted by yulunliu 51 NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing · 6 authors 2
Submitted by myownskyW7 40 MotionClone: Training-Free Motion Cloning for Controllable Video Generation · 9 authors 4
Submitted by yixinsong 37 PowerInfer-2: Fast Large Language Model Inference on a Smartphone · 6 authors 5
Submitted by Liuff23 35 Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion · 6 authors 4
Submitted by lixin4ever 34 VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs · 11 authors 2
Submitted by jedyang97 28 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination · 7 authors 2
Submitted by akhaliq 25 MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos · 14 authors
Submitted by yixinsong 24 Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters · 7 authors 2
Submitted by GlyphByT5 19 FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation · 8 authors
Submitted by chrlu 14 Discovering Preference Optimization Algorithms with and for Large Language Models · 7 authors
Submitted by akhaliq 14 AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation · 5 authors
Submitted by akhaliq 13 Hierarchical Patch Diffusion Models for High-Resolution Video Generation · 4 authors
Submitted by yifanzhang114 11 Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models · 7 authors 2
Submitted by chrisliu298 7 Large Language Model Unlearning via Embedding-Corrupted Prompts · 4 authors
Submitted by AliBehrouz 7 Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models · 3 authors 1