ChrisGoringe commited on
Commit
4565d23
·
verified ·
1 Parent(s): d348f71

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -14
README.md CHANGED
@@ -16,19 +16,17 @@ They were created using the [convert.py script](https://github.com/chrisgoringe/
16
  They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
17
  models/unet directory.
18
 
19
- ## Bigger numbers in the name = smaller model!
20
-
21
  ## Naming convention (mx for 'mixed')
22
 
23
- [original_model_name]_mxNN_N.gguf
24
 
25
- where NN_N is the approximate *reduction* in VRAM usage compared the full 16 bit version.
26
  ```
27
- - 9_0 might just fit on a 16GB card
28
- - 10_6 is a good balance for 16GB cards,
29
- - 12_0 is roughly the size of an 8 bit model,
30
- - 14_1 should work for 12 GB cards
31
- - 15_2 is fully quantised to Q4_1
32
  ```
33
  ## How is this optimised?
34
 
@@ -59,7 +57,7 @@ The optimisation recipes are as follows (layers 0-18 are the double_block_layers
59
  ```python
60
 
61
  CONFIGURATIONS = {
62
- "9_0" : {
63
  'casts': [
64
  {'layers': '0-10', 'castto': 'BF16'},
65
  {'layers': '11-14, 54', 'castto': 'Q8_0'},
@@ -67,7 +65,7 @@ CONFIGURATIONS = {
67
  {'layers': '37-38, 56', 'castto': 'Q4_1'},
68
  ]
69
  },
70
- "10_6" : {
71
  'casts': [
72
  {'layers': '0-4, 10', 'castto': 'BF16'},
73
  {'layers': '5-9, 11-14', 'castto': 'Q8_0'},
@@ -75,7 +73,7 @@ CONFIGURATIONS = {
75
  {'layers': '36-40, 56', 'castto': 'Q4_1'},
76
  ]
77
  },
78
- "12_0" : {
79
  'casts': [
80
  {'layers': '0-2', 'castto': 'BF16'},
81
  {'layers': '5, 7-12', 'castto': 'Q8_0'},
@@ -83,13 +81,13 @@ CONFIGURATIONS = {
83
  {'layers': '34-41, 56', 'castto': 'Q4_1'},
84
  ]
85
  },
86
- "14_1" : {
87
  'casts': [
88
  {'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
89
  {'layers': '26, 29-43, 55-56', 'castto': 'Q4_1'},
90
  ]
91
  },
92
- "15_2" : {
93
  'casts': [
94
  {'layers': '0-56', 'castto': 'Q4_1'},
95
  ]
 
16
  They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
17
  models/unet directory.
18
 
 
 
19
  ## Naming convention (mx for 'mixed')
20
 
21
+ [original_model_name]_mxN_N.gguf
22
 
23
+ where N_N is the actual average number of bits per parameter.
24
  ```
25
+ - 9_6 might just fit on a 16GB card
26
+ - 8_4 is a good balance for 16GB cards,
27
+ - 7_4 is roughly the size of an 8 bit model,
28
+ - 5_9 should work for 12 GB cards
29
+ - 5_1 is mostly quantised to Q4_1
30
  ```
31
  ## How is this optimised?
32
 
 
57
  ```python
58
 
59
  CONFIGURATIONS = {
60
+ "9_6" : {
61
  'casts': [
62
  {'layers': '0-10', 'castto': 'BF16'},
63
  {'layers': '11-14, 54', 'castto': 'Q8_0'},
 
65
  {'layers': '37-38, 56', 'castto': 'Q4_1'},
66
  ]
67
  },
68
+ "8_4" : {
69
  'casts': [
70
  {'layers': '0-4, 10', 'castto': 'BF16'},
71
  {'layers': '5-9, 11-14', 'castto': 'Q8_0'},
 
73
  {'layers': '36-40, 56', 'castto': 'Q4_1'},
74
  ]
75
  },
76
+ "7_4" : {
77
  'casts': [
78
  {'layers': '0-2', 'castto': 'BF16'},
79
  {'layers': '5, 7-12', 'castto': 'Q8_0'},
 
81
  {'layers': '34-41, 56', 'castto': 'Q4_1'},
82
  ]
83
  },
84
+ "5_9" : {
85
  'casts': [
86
  {'layers': '0-25, 27-28, 44-54', 'castto': 'Q5_1'},
87
  {'layers': '26, 29-43, 55-56', 'castto': 'Q4_1'},
88
  ]
89
  },
90
+ "5_1" : {
91
  'casts': [
92
  {'layers': '0-56', 'castto': 'Q4_1'},
93
  ]