new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

by AK and the research community

Mar 15

Submitted by

akhaliq

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

·
31 authors

Submitted by

akhaliq

Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

·
6 authors

Submitted by

akhaliq

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

·
3 authors

Submitted by

akhaliq

GiT: Towards Generalist Vision Transformer through Universal Language Interface

·
8 authors

Submitted by

akhaliq

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

·
4 authors

Submitted by

akhaliq

Video Editing via Factorized Diffusion Distillation

·
7 authors

Submitted by

akhaliq

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

·
9 authors

Submitted by

akhaliq

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

·
7 authors

Submitted by

akhaliq

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

·
6 authors

Submitted by

akhaliq

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

·
10 authors

Submitted by

akhaliq

Veagle: Advancements in Multimodal Representation Learning

·
9 authors

Submitted by

akhaliq

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

·
10 authors

Submitted by

akhaliq

3D-VLA: A 3D Vision-Language-Action Generative World Model

·
8 authors

Submitted by

akhaliq

LocalMamba: Visual State Space Model with Windowed Selective Scan

·
6 authors