DavidAU
/

DeepSeek-R1-Distill-Qwen-25.5B-Brainstorm-gguf

Model card Files Files and versions Community

DavidAU commited on 10 days ago

Commit

1a2c385

·

verified ·

1 Parent(s): 26968b9

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ pipeline_tag: text-generation
 (quants uploading...)
-<h2>DeepSeek-R1-Distill-Qwen-25.5B with Brainstorm 40x, (88 layers, 1047 tensors) </h2>
 Context : 128k.
@@ -52,7 +52,7 @@ The "thinking/reasoning" tech (for the model at this repo) is from the original
 [ https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B ]
-In this case, Brainstorm 40x module was grafted directly onto "DeepSeek-R1-Distill-Llama-8B" bringing it up to 72 layers, 16.5B parameters.
 ---

 (quants uploading...)
+<h2>DeepSeek-R1-Distill-Qwen-25.5B with Brainstorm 40x, (88 layers, 1043 tensors) </h2>
 Context : 128k.
 [ https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B ]
+In this case, Brainstorm 40x module was grafted directly onto "DeepSeek-R1-Distill-Llama-8B" bringing it up to 88 layers, 25.5B parameters.
 ---