-
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 14 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 57 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 47
Collections
Discover the best community collections!
Collections including paper arxiv:2401.06066
-
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 38 -
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets
Paper • 2312.10253 • Published • 7 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
Mixtral of Experts
Paper • 2401.04088 • Published • 157
-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 41 -
Interfacing Foundation Models' Embeddings
Paper • 2312.07532 • Published • 11 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 18 -
TheBloke/quantum-v0.01-GPTQ
Text Generation • Updated • 22 • 2
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 17 -
Mixtral of Experts
Paper • 2401.04088 • Published • 157 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 70 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 47
-
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 3 -
Ziya2: Data-centric Learning is All LLMs Need
Paper • 2311.03301 • Published • 17 -
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Paper • 2311.02103 • Published • 17 -
Extending Context Window of Large Language Models via Semantic Compression
Paper • 2312.09571 • Published • 13
-
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference
Paper • 2308.12066 • Published • 4 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper • 2303.06182 • Published • 1 -
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate
Paper • 2112.14397 • Published • 1
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 18 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 37 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 47 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 22