pcuenq
/

DeepSeek-R1-Distill-Qwen-32B-Q2-6

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Create README.md

#1

by pcuenq HF staff - opened 1 day ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +23 -0

README.md ADDED Viewed

	@@ -0,0 +1,23 @@

+---
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- conversational
+- mlx
+base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
+---
+# DeepSeek-R1-Distill-Qwen-32B-Q2-6
+This model was converted to MLX from [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B), using mixed 2/6 bit quantization. This scheme preserves quality much more than a standard 2-bit quantization.
+## Use with mlx
+```bash
+pip install mlx-lm
+```
+```bash
+python -m mlx_lm.chat --model pcuenq/DeepSeek-R1-Distill-Qwen-32B-Q2-6 --max-tokens 10000 --temp 0.6 --top-p 0.7
+```