Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 5 days ago • 19
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published 18 days ago • 28
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published 18 days ago • 28
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration Paper • 2406.18516 • Published Jun 26, 2024 • 3
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 19 days ago • 23
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 20 days ago • 79
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published 27 days ago • 20
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published 27 days ago • 20
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published 27 days ago • 20
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published 27 days ago • 20
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published 27 days ago • 15
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published 27 days ago • 15
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper • 2501.07171 • Published 29 days ago • 49
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 25
FRNet: Frustum-Range Networks for Scalable LiDAR Segmentation Paper • 2312.04484 • Published Dec 7, 2023
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes Paper • 2501.04004 • Published Jan 7 • 1
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 25
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving Paper • 2501.04005 • Published Jan 7
OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies Paper • 2501.00326 • Published Dec 31, 2024 • 1
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published Jan 7 • 23