view article Article Process Reinforcement through Implicit Rewards By ganqu and 1 other • Jan 3 • 22
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published Dec 30, 2024 • 35
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Paper • 2406.10774 • Published Jun 16, 2024 • 2
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 566
NanoFlow: Towards Optimal Large Language Model Serving Throughput Paper • 2408.12757 • Published Aug 22, 2024 • 18
Hydragen: High-Throughput LLM Inference with Shared Prefixes Paper • 2402.05099 • Published Feb 7, 2024 • 20
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 36
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning Paper • 2311.02103 • Published Nov 1, 2023 • 17