Submitted by viettmab 109 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance · 7 authors 2
Submitted by leo1117 31 TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation · 10 authors 3
Submitted by jingtan 27 Imagine360: Immersive 360 Video Generation from Perspective Anchor · 7 authors 2
Submitted by SYZhang0805 26 Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion · 10 authors 2
Submitted by KangsanKim71 22 VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding · 5 authors 2
Submitted by kimyoungjune 20 VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models · 4 authors 2
Submitted by xiangjun-xj 20 One Shot, One Talk: Whole-body Talking Avatar from a Single Image · 6 authors 2
Submitted by ChenDY 19 NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training · 4 authors 2
Submitted by ZyZcuhk 19 NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images · 10 authors 3
Submitted by zd11024 17 Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding · 3 authors 2
Submitted by cogwheelhead 16 U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs · 7 authors 2
Submitted by huanngzh 16 MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation · 10 authors 2
Submitted by Dahoas 12 Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models · 20 authors 3
Submitted by BiaoGong 12 Mimir: Improving Video Diffusion Models for Precise Text Understanding · 9 authors 2
Submitted by wjpoom 11 Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning · 10 authors 2
Submitted by Wanfq 10 Weighted-Reward Preference Optimization for Implicit Model Fusion · 5 authors 2
Submitted by xyxingx 7 LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting · 5 authors 3