File size: 2,653 Bytes
e4d823f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
---
language:
- "en"
tags:
- video
license: apache-2.0
pipeline_tag: text-to-video
library_name: diffusers
---
<p align="center">
<img src="assets/logo.jpg" height=30>
</p>
# FastMochi Model Card
## Model Details
FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps.
- **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/)
- **License**: Apache-2.0
- **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview)
- **Github Repository**: https://github.com/hao-ai-lab/FastVideo
## Usage
- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README.
- You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi).
<details>
<summary>Code</summary>
```python
from genmo.mochi_preview.pipelines import (
DecoderModelFactory,
DitModelFactory,
MochiMultiGPUPipeline,
T5ModelFactory,
linear_quadratic_schedule,
)
from genmo.lib.utils import save_video
import os
with open("prompt.txt", "r") as f:
prompts = [line.rstrip() for line in f]
pipeline = MochiMultiGPUPipeline(
text_encoder_factory=T5ModelFactory(),
world_size=4,
dit_factory=DitModelFactory(
model_path=f"weights/dit.safetensors", model_dtype="bf16"
),
decoder_factory=DecoderModelFactory(
model_path=f"weights/decoder.safetensors",
),
)
# read prompt line by line from prompt.txt
output_dir = "outputs"
os.makedirs(output_dir, exist_ok=True)
for i, prompt in enumerate(prompts):
video = pipeline(
height=480,
width=848,
num_frames=163,
num_inference_steps=8,
sigma_schedule=linear_quadratic_schedule(8, 0.1, 6),
cfg_schedule=[1.5] * 8,
batch_cfg=False,
prompt=prompt,
negative_prompt="",
seed=12345,
)[0]
save_video(video, f"{output_dir}/output_{i}.mp4")
```
</details>
## Training details
FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters:
- Batch size: 32
- Resulotion: 480X848
- Num of frames: 169
- Train steps: 128
- GPUs: 16
- LR: 1e-6
- Loss: huber
## Evaluation
We provide some qualitative comparison between FastMochi 8 step inference v.s. the original Mochi with 8 step inference:
|