|
--- |
|
pipeline_tag: text-to-video |
|
license: other |
|
license_name: tencent-hunyuan-community |
|
license_link: LICENSE |
|
--- |
|
|
|
<p align="center"> |
|
<img src="assets/logo.jpg" height=30> |
|
</p> |
|
|
|
# FastHunyuan Model Card |
|
|
|
## Model Details |
|
|
|
FastHunyuan is an accelerated [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo) model. It can sample high quality videos with 6 diffusion steps. That brings around 8X speed up compared to the original HunyuanVideo with 50 steps. |
|
|
|
- **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/) |
|
- **License**: tencent-hunyuan-community |
|
- **Distilled from**: [HunyuanVideo](https://huggingface.co/tencent/HunyuanVideo) |
|
- **Github Repository**: https://github.com/hao-ai-lab/FastVideo |
|
|
|
## Usage |
|
|
|
- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README. |
|
- Alternatively, you can inference FastHunyuan using the official [Hunyuan Video repository](https://github.com/Tencent/HunyuanVideo) by **setting the shift to 17 and steps to 6, resolution to 720X1280X125, and cfg bigger than 6**. |
|
We find that a large CFG scale generally leads to faster videos. |
|
|
|
## Training details |
|
|
|
FastHunyuan is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: |
|
- Batch size: 16 |
|
- Resulotion: 720x1280 |
|
- Num of frames: 125 |
|
- Train steps: 320 |
|
- GPUs: 32 |
|
- LR: 1e-6 |
|
- Loss: huber |
|
|
|
## Evaluation |
|
We provide some qualitative comparison between FastHunyuan 6 step inference v.s. the original Hunyuan with 6 step inference: |
|
|
|
| FastHunyuan 6 step | Hunyuan 6 step | |
|
| --- | --- | |
|
| data:image/s3,"s3://crabby-images/67caf/67caf4b29ab3eac1610e117327c7e05ef5925839" alt="FastHunyuan 6 step" | data:image/s3,"s3://crabby-images/a9332/a933298b2631b05831149bcff94e25443f4ebfe2" alt="Hunyuan 6 step" | |
|
| data:image/s3,"s3://crabby-images/4dde9/4dde98577ea629cfff56d4cbbabe8f9b60961d28" alt="FastHunyuan 6 step" | data:image/s3,"s3://crabby-images/784aa/784aaf39f647529193b6ab6a9a8f33401fb780f9" alt="Hunyuan 6 step" | |
|
| data:image/s3,"s3://crabby-images/3d40e/3d40e504e6afe8b3969b5b841cdef31260082ccf" alt="FastHunyuan 6 step" | data:image/s3,"s3://crabby-images/61668/6166863a8591fd8e6be7a413b2595e3d66eca65a" alt="Hunyuan 6 step" | |
|
| data:image/s3,"s3://crabby-images/0d3bb/0d3bba5652d6db9b469bb6e2b014f34c9989151e" alt="FastHunyuan 6 step" | data:image/s3,"s3://crabby-images/1a707/1a707331a2c8b82714c614f59d9b1979aa310ec1" alt="Hunyuan 6 step" | |
|
|
|
## Memory requirements |
|
|
|
Please check our github repo for details. https://github.com/hao-ai-lab/FastVideo |
|
|
|
For inference, we can inference FastHunyuan on single RTX4090. We now support NF4 and LLM-INT8 quantized inference using BitsAndBytes for FastHunyuan. With NF4 quantization, inference can be performed on a single RTX 4090 GPU, requiring just 20GB of VRAM. |
|
|
|
For Lora Finetune, minimum hardware requirement |
|
- 40 GB GPU memory each for 2 GPUs with lora |
|
- 30 GB GPU memory each for 2 GPUs with CPU offload and lora. |
|
|