ChrisGoringe commited on
Commit
186c280
·
verified ·
1 Parent(s): e1b9a1b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -16
README.md CHANGED
@@ -24,25 +24,18 @@ where N_N is the average number of bits per parameter.
24
 
25
  ## Good choices to start with
26
  ```
27
- - 9_2 is a good choice for 16 GB cards
28
- - 6_9 just fits on a 12 GB card
29
- - 5_9 is comfortable on 12 GB cards
 
30
  ```
31
 
32
  ## Speed?
33
 
34
- On an A40 (plenty of VRAM), everything except the model identical, the time taken to generate an image (30 steps, deis sampler) was:
 
35
 
36
- - 5_1 => 40.1s
37
- - 5_9 => 55.4s
38
- - 6_9 => 52.1s
39
- - 7_4 => 49.7s
40
- - 7_6 => 43.6s
41
- - 8_4 => 46.8s
42
- - 9_2 => 42.8s
43
- - 9_6 => 48.2s
44
-
45
- for comparison, the unquantised models take about 27s.
46
 
47
  ## How is this optimised?
48
 
@@ -63,5 +56,5 @@ The process for optimisation is as follows:
63
 
64
  - Tests on using bitsandbytes quantizations showed they did not perform as well as the equivalent sized GGUF quants
65
  - Different quantizations of different parts of a layer gave significantly worse results
66
- - Leaving bias in 16 bit made no relevant difference
67
- - Costs were evaluated for the original Flux.1-dev model. They are assumed to be essentially the same for finetunes
 
24
 
25
  ## Good choices to start with
26
  ```
27
+ - 3_8 might work on a 8 GB card
28
+ - 6_9 should be good for a 12 GB card
29
+ - 8_2 is a good choice for 16 GB cards if you want to add LoRAs etc
30
+ - 9_2 fits on a 16 GB card
31
  ```
32
 
33
  ## Speed?
34
 
35
+ On an A40 (plenty of VRAM), everything except the model identical,
36
+ the time taken to generate an image (30 steps, deis sampler) was about 65% longer than for the full model.
37
 
38
+ Quantised models will generally be slower because the weights have to be converted back into a native torch form when they are needed.
 
 
 
 
 
 
 
 
 
39
 
40
  ## How is this optimised?
41
 
 
56
 
57
  - Tests on using bitsandbytes quantizations showed they did not perform as well as the equivalent sized GGUF quants
58
  - Different quantizations of different parts of a layer gave significantly worse results
59
+ - Leaving bias in 16 bit made no relevant difference (the 'patched' models generally do)
60
+ - Costs were evaluated for the original Flux.1-dev model. They are probably essentially the same for finetunes