ChrisGoringe
/

MixedQuantFlux

GGUF

Model card Files Files and versions Community

ChrisGoringe commited on Sep 7, 2024

Commit

4565d23

verified ·

1 Parent(s): d348f71

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -14

README.md CHANGED Viewed

@@ -16,19 +16,17 @@ They were created using the [convert.py script](https://github.com/chrisgoringe/
 They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
 models/unet directory.
-## Bigger numbers in the name = smaller model!
 ## Naming convention (mx for 'mixed')
-[original_model_name]_mxNN_N.gguf
-where NN_N is the approximate *reduction* in VRAM usage compared the full 16 bit version.
 ```
--  9_0 might just fit on a 16GB card
-- 10_6 is a good balance for 16GB cards,
-- 12_0 is roughly the size of an 8 bit model,
-- 14_1 should work for 12 GB cards
-- 15_2 is fully quantised to Q4_1
 ```
 ## How is this optimised?
@@ -59,7 +57,7 @@ The optimisation recipes are as follows (layers 0-18 are the double_block_layers
 ```python
 CONFIGURATIONS = {
-    "9_0" : {
         'casts': [
             {'layers': '0-10',             'castto': 'BF16'},
             {'layers': '11-14, 54',        'castto': 'Q8_0'},
@@ -67,7 +65,7 @@ CONFIGURATIONS = {
             {'layers': '37-38, 56',        'castto': 'Q4_1'},
         ]
     },
-    "10_6" : {
         'casts': [
             {'layers': '0-4, 10',      'castto': 'BF16'},
             {'layers': '5-9, 11-14',   'castto': 'Q8_0'},
@@ -75,7 +73,7 @@ CONFIGURATIONS = {
             {'layers': '36-40, 56',    'castto': 'Q4_1'},
         ]
     },
-    "12_0" : {
         'casts': [
             {'layers': '0-2',                  'castto': 'BF16'},
             {'layers': '5, 7-12',              'castto': 'Q8_0'},
@@ -83,13 +81,13 @@ CONFIGURATIONS = {
             {'layers': '34-41, 56',            'castto': 'Q4_1'},
         ]
     },
-    "14_1" : {
         'casts': [
             {'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
             {'layers': '26, 29-43, 55-56',   'castto': 'Q4_1'},
         ]
     },
-    "15_2" : {
         'casts': [
             {'layers': '0-56', 'castto': 'Q4_1'},
         ]

 They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
 models/unet directory.
 ## Naming convention (mx for 'mixed')
+[original_model_name]_mxN_N.gguf
+where N_N is the actual average number of bits per parameter.
 ```
+-  9_6 might just fit on a 16GB card
+-  8_4 is a good balance for 16GB cards,
+-  7_4 is roughly the size of an 8 bit model,
+-  5_9 should work for 12 GB cards
+-  5_1 is mostly quantised to Q4_1
 ```
 ## How is this optimised?
 ```python
 CONFIGURATIONS = {
+    "9_6" : {
         'casts': [
             {'layers': '0-10',             'castto': 'BF16'},
             {'layers': '11-14, 54',        'castto': 'Q8_0'},
             {'layers': '37-38, 56',        'castto': 'Q4_1'},
         ]
     },
+    "8_4" : {
         'casts': [
             {'layers': '0-4, 10',      'castto': 'BF16'},
             {'layers': '5-9, 11-14',   'castto': 'Q8_0'},
             {'layers': '36-40, 56',    'castto': 'Q4_1'},
         ]
     },
+    "7_4" : {
         'casts': [
             {'layers': '0-2',                  'castto': 'BF16'},
             {'layers': '5, 7-12',              'castto': 'Q8_0'},
             {'layers': '34-41, 56',            'castto': 'Q4_1'},
         ]
     },
+    "5_9" : {
         'casts': [
             {'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
             {'layers': '26, 29-43, 55-56',   'castto': 'Q4_1'},
         ]
     },
+    "5_1" : {
         'casts': [
             {'layers': '0-56', 'castto': 'Q4_1'},
         ]