foreverpiano commited on
Commit
725712e
·
verified ·
1 Parent(s): 6cb07f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -47,3 +47,10 @@ We provide some qualitative comparison between FastHunyuan 6 step inference v.s.
47
  | ![FastHunyuan 6 step](assets/distilled/3.gif) | ![Hunyuan 6 step](assets/undistilled/3.gif) |
48
  | ![FastHunyuan 6 step](assets/distilled/4.gif) | ![Hunyuan 6 step](assets/undistilled/4.gif) |
49
 
 
 
 
 
 
 
 
 
47
  | ![FastHunyuan 6 step](assets/distilled/3.gif) | ![Hunyuan 6 step](assets/undistilled/3.gif) |
48
  | ![FastHunyuan 6 step](assets/distilled/4.gif) | ![Hunyuan 6 step](assets/undistilled/4.gif) |
49
 
50
+ ## Memory requirements
51
+
52
+ For inference, we can inference FastHunyuan on single RTX4090. We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM.
53
+
54
+ For Lora Finetune, minimum hardware requirement
55
+ - 40 GB GPU memory each for 2 GPUs with lora
56
+ - 30 GB GPU memory each for 2 GPUs with CPU offload and lora.