Update README.md
Browse files
README.md
CHANGED
@@ -57,7 +57,7 @@ snapshot_download(
|
|
57 |
allow_patterns = ["*UD-IQ1_S*"], # Select quant type UD-IQ1_S for 1.58bit
|
58 |
)
|
59 |
```
|
60 |
-
|
61 |
```bash
|
62 |
./llama.cpp/llama-cli \
|
63 |
--model DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
|
@@ -79,7 +79,7 @@ snapshot_download(
|
|
79 |
Is there a scenario where 1 plus 1 wouldn't be 2? I can't think of any...
|
80 |
```
|
81 |
|
82 |
-
|
83 |
```bash
|
84 |
./llama.cpp/llama-cli \
|
85 |
--model DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
|
@@ -91,7 +91,7 @@ snapshot_download(
|
|
91 |
--seed 3407 \
|
92 |
--prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|>"
|
93 |
```
|
94 |
-
|
95 |
```
|
96 |
./llama.cpp/llama-gguf-split --merge \
|
97 |
DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
|
|
|
57 |
allow_patterns = ["*UD-IQ1_S*"], # Select quant type UD-IQ1_S for 1.58bit
|
58 |
)
|
59 |
```
|
60 |
+
5. Example with Q4_0 K quantized cache **Notice -no-cnv disables auto conversation mode**
|
61 |
```bash
|
62 |
./llama.cpp/llama-cli \
|
63 |
--model DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
|
|
|
79 |
Is there a scenario where 1 plus 1 wouldn't be 2? I can't think of any...
|
80 |
```
|
81 |
|
82 |
+
6. If you have a GPU (RTX 4090 for example) with 24GB, you can offload multiple layers to the GPU for faster processing. If you have multiple GPUs, you can probably offload more layers.
|
83 |
```bash
|
84 |
./llama.cpp/llama-cli \
|
85 |
--model DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
|
|
|
91 |
--seed 3407 \
|
92 |
--prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|>"
|
93 |
```
|
94 |
+
7. If you want to merge the weights together, use this script:
|
95 |
```
|
96 |
./llama.cpp/llama-gguf-split --merge \
|
97 |
DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \
|