Submitted by akhaliq 54 Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey · 27 authors 2
Submitted by ZehanWang 20 Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models · 6 authors 4
Submitted by ynhe 18 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment · 12 authors 2
Submitted by KyleLin 15 From Elements to Design: A Layered Approach for Automatic Graphic Design Composition · 6 authors 2
Submitted by BestWishYsh 13 VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models · 9 authors 2
Submitted by mskrt 12 The Superposition of Diffusion Models Using the Itô Density Estimator · 5 authors 2
Submitted by jacksukk 8 Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging · 6 authors 2
Submitted by yanlinf 6 CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era · 3 authors 2
Submitted by risashinoda 5 SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images · 5 authors 2