Submitted by akhaliq 89 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks · 9 authors 6
Submitted by akhaliq 37 JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models · 12 authors 1
Submitted by akhaliq 31 Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model · 10 authors 4
Submitted by akhaliq 28 Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs · 7 authors 2
Submitted by akhaliq 18 Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization · 14 authors 1
Submitted by akhaliq 13 FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores · 4 authors 1
Submitted by akhaliq 11 ADaPT: As-Needed Decomposition and Planning with Language Models · 7 authors 1
Submitted by akhaliq 10 Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities · 6 authors 1
Submitted by akhaliq 6 Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems · 8 authors 1