Submitted by akhaliq 40 SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models · 8 authors 5
Submitted by Ningyu 34 Knowledge Mechanisms in Large Language Models: A Survey and Perspective · 13 authors 2
Submitted by akhaliq 34 NNsight and NDIF: Democratizing Access to Foundation Model Internals · 20 authors 2
Submitted by grafft 21 POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation · 6 authors 2
Submitted by teowu 20 LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding · 4 authors 4
Submitted by yulunliu 17 BoostMVSNeRFs: Boosting MVS-based NeRFs to Generalizable View Synthesis in Large-scale Scenes · 6 authors 2
Submitted by akhaliq 14 Artist: Aesthetically Controllable Text-Driven Stylization without Training · 2 authors 5
Submitted by akhaliq 12 HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions · 5 authors 2
Submitted by akhaliq 11 Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models · 7 authors 2
Submitted by xhyandwyy 10 MIBench: Evaluating Multimodal Large Language Models over Multiple Images · 11 authors 3
Submitted by akhaliq 10 Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning · 20 authors 2
Submitted by Ori 9 AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? · 6 authors 4
Submitted by akhaliq 9 MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation · 4 authors 2
Submitted by liuhuohuo 6 CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model · 5 authors 2
Submitted by akhaliq 4 GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization · 2 authors 2
Submitted by davidchan 2 Visual Haystacks: Answering Harder Questions About Sets of Images · 7 authors 4