Submitted by SijieCheng 48 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI · 9 authors 3
Submitted by zfj1998 43 HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks · 9 authors 2
Submitted by wanderkid 32 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception · 4 authors 2
Submitted by Sicong 31 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio · 10 authors 2
Submitted by feifeiobama 19 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models · 2 authors 3
Submitted by luping-liu 15 Improving Long-Text Alignment for Text-to-Image Diffusion Models · 6 authors 2
Submitted by andrewyates 13 DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities · 6 authors 2
Submitted by zsytony 13 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs · 6 authors 2
Submitted by jackzhang 12 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements · 5 authors 2
Submitted by kpzhang996 12 ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression · 7 authors 3
Submitted by youngsheen 8 Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective · 6 authors 2
Submitted by Minbyul 8 ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains · 6 authors 3
Submitted by shanchen 5 WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation · 16 authors 2
Submitted by skrishna 4 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL · 4 authors 2
Submitted by IAMJB 1 From Commands to Prompts: LLM-based Semantic File System for AIOS · 12 authors 1