-
neuralmagic/granite-3.1-2b-instruct-quantized.w4a16
Text Generation • Updated • 239 -
neuralmagic/granite-3.1-2b-instruct-quantized.w8a8
Text Generation • Updated • 208 -
neuralmagic/granite-3.1-8b-instruct-quantized.w4a16
Text Generation • Updated • 171 • 1 -
neuralmagic/granite-3.1-8b-instruct-quantized.w8a8
Text Generation • Updated • 182
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
Neural Magic
AI & ML interests
LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV
Recent Activity
The Future of AI is Open
Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.
- nm-vllm: Enterprise-ready inferencing system based on the open-source library, vLLM, for at-scale operationalization of performant open-source LLMs
- LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
- DeepSparse: Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.
Collections
13
-
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic
Text Generation • Updated • 55 • 1 -
neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4
Text Generation • Updated • 26 • 1 -
neuralmagic/Sparse-Llama-3.1-8B-2of4
Text Generation • Updated • 882 • 61 -
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4
Text Generation • Updated • 17 • 1
spaces
8
Quant Llms Text Generation
Quantized vs. Unquantized LLM: Text Generation Comparison
Llama 3 8B Chat Deepsparse
Chat with a Llama-3-8B-Instruct model efficiently using text
Llama 2 Sparse Transfer Chat Deepsparse
DeepSparse Sentiment Analysis
DeepSparse Named Entity Recognition
Sparse Llama Gsm8k
Solve math problems with chat-based guidance
models
285
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/whisper-large-v2-W4A16-G128
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/Phi-3-vision-128k-instruct-W4A16-G128
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/Mistral-Small-24B-Instruct-2501-FP8-Dynamic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/granite-3.1-2b-base-quantized.w4a16
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/granite-3.1-8b-base-FP8-dynamic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/granite-3.1-8b-base-quantized.w8a8
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/granite-3.1-8b-instruct-quantized.w4a16
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/granite-3.1-8b-base-quantized.w4a16
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
neuralmagic/granite-3.1-2b-base-quantized.w8a8
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)