GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM Paper • 2403.05527 • Published Mar 8, 2024 • 1
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 41