PY007 commited on
Commit
e4d823f
·
verified ·
1 Parent(s): 54272b5

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +96 -0
  2. assets/logo.jpg +0 -0
README.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "en"
4
+ tags:
5
+ - video
6
+ license: apache-2.0
7
+ pipeline_tag: text-to-video
8
+ library_name: diffusers
9
+ ---
10
+
11
+ <p align="center">
12
+ <img src="assets/logo.jpg" height=30>
13
+ </p>
14
+
15
+ # FastMochi Model Card
16
+
17
+ ## Model Details
18
+
19
+ FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps.
20
+
21
+ - **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/)
22
+ - **License**: Apache-2.0
23
+ - **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview)
24
+ - **Github Repository**: https://github.com/hao-ai-lab/FastVideo
25
+
26
+ ## Usage
27
+
28
+ - Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README.
29
+ - You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi).
30
+
31
+ <details>
32
+ <summary>Code</summary>
33
+
34
+ ```python
35
+ from genmo.mochi_preview.pipelines import (
36
+ DecoderModelFactory,
37
+ DitModelFactory,
38
+ MochiMultiGPUPipeline,
39
+ T5ModelFactory,
40
+ linear_quadratic_schedule,
41
+ )
42
+ from genmo.lib.utils import save_video
43
+ import os
44
+
45
+ with open("prompt.txt", "r") as f:
46
+ prompts = [line.rstrip() for line in f]
47
+
48
+ pipeline = MochiMultiGPUPipeline(
49
+ text_encoder_factory=T5ModelFactory(),
50
+ world_size=4,
51
+ dit_factory=DitModelFactory(
52
+ model_path=f"weights/dit.safetensors", model_dtype="bf16"
53
+ ),
54
+ decoder_factory=DecoderModelFactory(
55
+ model_path=f"weights/decoder.safetensors",
56
+ ),
57
+ )
58
+ # read prompt line by line from prompt.txt
59
+
60
+
61
+ output_dir = "outputs"
62
+ os.makedirs(output_dir, exist_ok=True)
63
+ for i, prompt in enumerate(prompts):
64
+ video = pipeline(
65
+ height=480,
66
+ width=848,
67
+ num_frames=163,
68
+ num_inference_steps=8,
69
+ sigma_schedule=linear_quadratic_schedule(8, 0.1, 6),
70
+ cfg_schedule=[1.5] * 8,
71
+ batch_cfg=False,
72
+ prompt=prompt,
73
+ negative_prompt="",
74
+ seed=12345,
75
+ )[0]
76
+ save_video(video, f"{output_dir}/output_{i}.mp4")
77
+ ```
78
+
79
+ </details>
80
+
81
+
82
+ ## Training details
83
+
84
+ FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters:
85
+ - Batch size: 32
86
+ - Resulotion: 480X848
87
+ - Num of frames: 169
88
+ - Train steps: 128
89
+ - GPUs: 16
90
+ - LR: 1e-6
91
+ - Loss: huber
92
+
93
+ ## Evaluation
94
+ We provide some qualitative comparison between FastMochi 8 step inference v.s. the original Mochi with 8 step inference:
95
+
96
+
assets/logo.jpg ADDED