Optimizing Large Language Model Training Using FP4 Quantization Paper β’ 2501.17116 β’ Published 9 days ago β’ 32
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper β’ 2501.13629 β’ Published 14 days ago β’ 42
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 29 days ago β’ 253
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper β’ 2412.10319 β’ Published Dec 13, 2024 β’ 9
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper β’ 2412.10319 β’ Published Dec 13, 2024 β’ 9
SCBench: A KV Cache-Centric Analysis of Long-Context Methods Paper β’ 2412.10319 β’ Published Dec 13, 2024 β’ 9 β’ 2
Multimodal Latent Language Modeling with Next-Token Diffusion Paper β’ 2412.08635 β’ Published Dec 11, 2024 β’ 44
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy Sep 18, 2024 β’ 216
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper β’ 2409.10516 β’ Published Sep 16, 2024 β’ 41
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper β’ 2409.10516 β’ Published Sep 16, 2024 β’ 41
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval Paper β’ 2409.10516 β’ Published Sep 16, 2024 β’ 41 β’ 2