BlackMamba: Mixture of Experts for State-Space Models Paper • 2402.01771 • Published Feb 1, 2024 • 24
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks Paper • 2212.00720 • Published Nov 16, 2022
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7, 2024 • 4
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7, 2024 • 4
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters Paper • 2408.04093 • Published Aug 7, 2024 • 4
The Unreasonable Ineffectiveness of the Deeper Layers Paper • 2403.17887 • Published Mar 26, 2024 • 79
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference Paper • 2401.08383 • Published Jan 16, 2024 • 1