-
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 137 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19 -
Priority Sampling of Large Language Models for Compilers
Paper • 2402.18734 • Published • 17
Collections
Discover the best community collections!
Collections including paper arxiv:2402.19427
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 50 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 45 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 23
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 50 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 137 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 43 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 15 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 608 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 87 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 33
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 608 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 185 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
ResLoRA: Identity Residual Mapping in Low-Rank Adaption
Paper • 2402.18039 • Published • 11
-
Graph Mamba: Towards Learning on Graphs with State Space Models
Paper • 2402.08678 • Published • 15 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 31 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 54 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 60
-
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19 -
Linear Transformers with Learnable Kernel Functions are Better In-Context Models
Paper • 2402.10644 • Published • 80 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 24 -
Zoology: Measuring and Improving Recall in Efficient Language Models
Paper • 2312.04927 • Published • 2