Submitted by akhaliq 31 TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones · 3 authors 6
Submitted by akhaliq 28 Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action · 8 authors 2
Submitted by akhaliq 26 Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math · 3 authors 11
Submitted by akhaliq 20 MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices · 11 authors 2
Submitted by akhaliq 16 DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision · 20 authors 4
Submitted by akhaliq 14 City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web · 2 authors 1
Submitted by akhaliq 14 I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models · 11 authors 1
Submitted by akhaliq 10 Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis · 4 authors 2
Submitted by akhaliq 8 Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks · 6 authors 1
Submitted by akhaliq 7 SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation · 11 authors 1
Submitted by akhaliq 7 PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion · 10 authors 1
Submitted by akhaliq 6 DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors · 5 authors 1