AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper β’ 2502.01341 β’ Published 3 days ago β’ 31
Multimodal foundation world models for generalist embodied agents Paper β’ 2406.18043 β’ Published Jun 26, 2024 β’ 1