ChrisGoringe
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -16,19 +16,17 @@ They were created using the [convert.py script](https://github.com/chrisgoringe/
|
|
16 |
They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
|
17 |
models/unet directory.
|
18 |
|
19 |
-
## Bigger numbers in the name = smaller model!
|
20 |
-
|
21 |
## Naming convention (mx for 'mixed')
|
22 |
|
23 |
-
[original_model_name]
|
24 |
|
25 |
-
where
|
26 |
```
|
27 |
-
-
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
31 |
-
-
|
32 |
```
|
33 |
## How is this optimised?
|
34 |
|
@@ -59,7 +57,7 @@ The optimisation recipes are as follows (layers 0-18 are the double_block_layers
|
|
59 |
```python
|
60 |
|
61 |
CONFIGURATIONS = {
|
62 |
-
"
|
63 |
'casts': [
|
64 |
{'layers': '0-10', 'castto': 'BF16'},
|
65 |
{'layers': '11-14, 54', 'castto': 'Q8_0'},
|
@@ -67,7 +65,7 @@ CONFIGURATIONS = {
|
|
67 |
{'layers': '37-38, 56', 'castto': 'Q4_1'},
|
68 |
]
|
69 |
},
|
70 |
-
"
|
71 |
'casts': [
|
72 |
{'layers': '0-4, 10', 'castto': 'BF16'},
|
73 |
{'layers': '5-9, 11-14', 'castto': 'Q8_0'},
|
@@ -75,7 +73,7 @@ CONFIGURATIONS = {
|
|
75 |
{'layers': '36-40, 56', 'castto': 'Q4_1'},
|
76 |
]
|
77 |
},
|
78 |
-
"
|
79 |
'casts': [
|
80 |
{'layers': '0-2', 'castto': 'BF16'},
|
81 |
{'layers': '5, 7-12', 'castto': 'Q8_0'},
|
@@ -83,13 +81,13 @@ CONFIGURATIONS = {
|
|
83 |
{'layers': '34-41, 56', 'castto': 'Q4_1'},
|
84 |
]
|
85 |
},
|
86 |
-
"
|
87 |
'casts': [
|
88 |
{'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
|
89 |
{'layers': '26, 29-43, 55-56', 'castto': 'Q4_1'},
|
90 |
]
|
91 |
},
|
92 |
-
"
|
93 |
'casts': [
|
94 |
{'layers': '0-56', 'castto': 'Q4_1'},
|
95 |
]
|
|
|
16 |
They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
|
17 |
models/unet directory.
|
18 |
|
|
|
|
|
19 |
## Naming convention (mx for 'mixed')
|
20 |
|
21 |
+
[original_model_name]_mxN_N.gguf
|
22 |
|
23 |
+
where N_N is the actual average number of bits per parameter.
|
24 |
```
|
25 |
+
- 9_6 might just fit on a 16GB card
|
26 |
+
- 8_4 is a good balance for 16GB cards,
|
27 |
+
- 7_4 is roughly the size of an 8 bit model,
|
28 |
+
- 5_9 should work for 12 GB cards
|
29 |
+
- 5_1 is mostly quantised to Q4_1
|
30 |
```
|
31 |
## How is this optimised?
|
32 |
|
|
|
57 |
```python
|
58 |
|
59 |
CONFIGURATIONS = {
|
60 |
+
"9_6" : {
|
61 |
'casts': [
|
62 |
{'layers': '0-10', 'castto': 'BF16'},
|
63 |
{'layers': '11-14, 54', 'castto': 'Q8_0'},
|
|
|
65 |
{'layers': '37-38, 56', 'castto': 'Q4_1'},
|
66 |
]
|
67 |
},
|
68 |
+
"8_4" : {
|
69 |
'casts': [
|
70 |
{'layers': '0-4, 10', 'castto': 'BF16'},
|
71 |
{'layers': '5-9, 11-14', 'castto': 'Q8_0'},
|
|
|
73 |
{'layers': '36-40, 56', 'castto': 'Q4_1'},
|
74 |
]
|
75 |
},
|
76 |
+
"7_4" : {
|
77 |
'casts': [
|
78 |
{'layers': '0-2', 'castto': 'BF16'},
|
79 |
{'layers': '5, 7-12', 'castto': 'Q8_0'},
|
|
|
81 |
{'layers': '34-41, 56', 'castto': 'Q4_1'},
|
82 |
]
|
83 |
},
|
84 |
+
"5_9" : {
|
85 |
'casts': [
|
86 |
{'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
|
87 |
{'layers': '26, 29-43, 55-56', 'castto': 'Q4_1'},
|
88 |
]
|
89 |
},
|
90 |
+
"5_1" : {
|
91 |
'casts': [
|
92 |
{'layers': '0-56', 'castto': 'Q4_1'},
|
93 |
]
|