Submitted by myownskyW7 34 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models · 10 authors 3
Submitted by shenxq 26 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding · 17 authors 2
Submitted by Zhoues 18 WorldSimBench: Towards Video Generation Models as World Simulators · 13 authors 2
Submitted by kiaia 15 Scaling Diffusion Language Models via Adaptation from Autoregressive Models · 12 authors 2
Submitted by shyamgopal 14 Scalable Ranked Preference Optimization for Text-to-Image Generation · 6 authors 2
Submitted by shayekh 11 M-RewardBench: Evaluating Reward Models in Multilingual Settings · 10 authors 3
Submitted by kpzhang996 6 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts · 4 authors 1
Submitted by coast01 3 LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias · 9 authors 2
Submitted by IAMJB 1 Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance · 4 authors 1