-
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Paper • 2404.02905 • Published • 68 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 14 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20
Collections
Discover the best community collections!
Collections including paper arxiv:2404.08856
-
Demystifying CLIP Data
Paper • 2309.16671 • Published • 20 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 10 -
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 21 -
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 18
-
Accelerating LLM Inference with Staged Speculative Decoding
Paper • 2308.04623 • Published • 25 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13 -
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 14
-
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 43 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Paper • 2403.09919 • Published • 21 -
On Speculative Decoding for Multimodal Large Language Models
Paper • 2404.08856 • Published • 14
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 41 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
-
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
Paper • 2310.03094 • Published • 13 -
MatFormer: Nested Transformer for Elastic Inference
Paper • 2310.07707 • Published • 1 -
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Paper • 2310.08461 • Published • 1