File size: 3,402 Bytes
e4d823f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c4f922
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e4d823f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c4f922
 
e4d823f
92d7fc4
3c4f922
 
 
 
 
e4d823f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
language:
 - "en"
tags:
 - video
license: apache-2.0
pipeline_tag: text-to-video
library_name: diffusers
---

<p align="center">
  <img src="assets/logo.jpg"  height=30>
</p>

# FastMochi Model Card

## Model Details

<div align="center">
<table style="margin-left: auto; margin-right: auto; border: none;">
  <tr>
    <td>
      <img src="assets/mochi-demo.gif" width="640" alt="Mochi Demo">
    </td>
  </tr>
  <tr>
    <td style="text-align:center;">
      Get 8X diffusion boost for Mochi with FastVideo
    </td>
  </tr>
</table>
  </div>

FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps.

- **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/)
- **License**:  Apache-2.0
- **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview)
- **Github Repository**: https://github.com/hao-ai-lab/FastVideo

## Usage

- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README.
- You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi).

<details>
  <summary>Code</summary>

```python
from genmo.mochi_preview.pipelines import (
    DecoderModelFactory,
    DitModelFactory,
    MochiMultiGPUPipeline,
    T5ModelFactory,
    linear_quadratic_schedule,
)
from genmo.lib.utils import save_video
import os

with open("prompt.txt", "r") as f:
    prompts = [line.rstrip() for line in f]
    
pipeline = MochiMultiGPUPipeline(
    text_encoder_factory=T5ModelFactory(),
    world_size=4,
    dit_factory=DitModelFactory(
        model_path=f"weights/dit.safetensors", model_dtype="bf16"
    ),
    decoder_factory=DecoderModelFactory(
        model_path=f"weights/decoder.safetensors",
    ),
)
# read prompt line by line from prompt.txt


output_dir = "outputs"
os.makedirs(output_dir, exist_ok=True)
for i, prompt in enumerate(prompts):
    video = pipeline(
        height=480,
        width=848,
        num_frames=163,
        num_inference_steps=8,
        sigma_schedule=linear_quadratic_schedule(8, 0.1, 6),
        cfg_schedule=[1.5] * 8,
        batch_cfg=False,
        prompt=prompt,
        negative_prompt="",
        seed=12345,
    )[0]
    save_video(video, f"{output_dir}/output_{i}.mp4")
```

</details>


## Training details

FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters: 
- Batch size: 32
- Resulotion: 480X848
- Num of frames: 169
- Train steps: 128
- GPUs: 16
- LR: 1e-6
- Loss: huber

## Evaluation
We provide some qualitative comparisons between FastMochi 8 step inference v.s. the original Mochi with 8 step inference: 


| FastMochi 6 steps | Mochi 6 steps |
| --- | --- |
| ![FastMochi 8 step](assets/distilled/1.gif) | ![Mochi 8 step](assets/undistilled/1.gif) |
| ![FastMochi 8 step](assets/distilled/2.gif) | ![Mochi 8 step](assets/undistilled/2.gif) |
| ![FastMochi 8 step](assets/distilled/3.gif) | ![Mochi 8 step](assets/undistilled/3.gif) |
| ![FastMochi 8 step](assets/distilled/4.gif) | ![Mochi 8 step](assets/undistilled/4.gif) |