Collections
Discover the best community collections!
Collections including paper arxiv:2402.03300
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 49 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 115 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 31 -
Scaling Laws for Downstream Task Performance of Large Language Models
Paper • 2402.04177 • Published • 18
-
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 27 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 88 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 18 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 12 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 88 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 46
-
LongAlign: A Recipe for Long Context Alignment of Large Language Models
Paper • 2401.18058 • Published • 21 -
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 18 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 16 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23
-
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
Paper • 2401.17053 • Published • 32 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 31 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 88 -
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Paper • 2402.05930 • Published • 39
-
Learning Universal Predictors
Paper • 2401.14953 • Published • 21 -
Anything in Any Scene: Photorealistic Video Object Insertion
Paper • 2401.17509 • Published • 17 -
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Paper • 2402.00854 • Published • 20 -
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis
Paper • 2401.17093 • Published • 20
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6