|
--- |
|
language: |
|
- "en" |
|
tags: |
|
- video |
|
license: apache-2.0 |
|
pipeline_tag: text-to-video |
|
library_name: diffusers |
|
--- |
|
|
|
<p align="center"> |
|
<img src="assets/logo.jpg" height=30> |
|
</p> |
|
|
|
# FastMochi Model Card |
|
|
|
## Model Details |
|
|
|
<div align="center"> |
|
<table style="margin-left: auto; margin-right: auto; border: none;"> |
|
<tr> |
|
<td> |
|
<img src="assets/mochi-demo.gif" width="640" alt="Mochi Demo"> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td style="text-align:center;"> |
|
Get 8X diffusion boost for Mochi with FastVideo |
|
</td> |
|
</tr> |
|
</table> |
|
</div> |
|
|
|
FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps. |
|
|
|
- **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/) |
|
- **License**: Apache-2.0 |
|
- **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview) |
|
- **Github Repository**: https://github.com/hao-ai-lab/FastVideo |
|
|
|
## Usage |
|
|
|
- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README. |
|
- You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi). |
|
|
|
<details> |
|
<summary>Code</summary> |
|
|
|
```python |
|
from genmo.mochi_preview.pipelines import ( |
|
DecoderModelFactory, |
|
DitModelFactory, |
|
MochiMultiGPUPipeline, |
|
T5ModelFactory, |
|
linear_quadratic_schedule, |
|
) |
|
from genmo.lib.utils import save_video |
|
import os |
|
|
|
with open("prompt.txt", "r") as f: |
|
prompts = [line.rstrip() for line in f] |
|
|
|
pipeline = MochiMultiGPUPipeline( |
|
text_encoder_factory=T5ModelFactory(), |
|
world_size=4, |
|
dit_factory=DitModelFactory( |
|
model_path=f"weights/dit.safetensors", model_dtype="bf16" |
|
), |
|
decoder_factory=DecoderModelFactory( |
|
model_path=f"weights/decoder.safetensors", |
|
), |
|
) |
|
# read prompt line by line from prompt.txt |
|
|
|
|
|
output_dir = "outputs" |
|
os.makedirs(output_dir, exist_ok=True) |
|
for i, prompt in enumerate(prompts): |
|
video = pipeline( |
|
height=480, |
|
width=848, |
|
num_frames=163, |
|
num_inference_steps=8, |
|
sigma_schedule=linear_quadratic_schedule(8, 0.1, 6), |
|
cfg_schedule=[1.5] * 8, |
|
batch_cfg=False, |
|
prompt=prompt, |
|
negative_prompt="", |
|
seed=12345, |
|
)[0] |
|
save_video(video, f"{output_dir}/output_{i}.mp4") |
|
``` |
|
|
|
</details> |
|
|
|
|
|
## Training details |
|
|
|
FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: |
|
- Batch size: 32 |
|
- Resulotion: 480X848 |
|
- Num of frames: 169 |
|
- Train steps: 128 |
|
- GPUs: 16 |
|
- LR: 1e-6 |
|
- Loss: huber |
|
|
|
## Evaluation |
|
We provide some qualitative comparisons between FastMochi 8 step inference v.s. the original Mochi with 8 step inference: |
|
|
|
|
|
| FastMochi 6 steps | Mochi 6 steps | |
|
| --- | --- | |
|
|  |  | |
|
|  |  | |
|
|  |  | |
|
|  |  | |
|
|
|
|