onlybetheone commited on
Commit
2d29482
Β·
verified Β·
1 Parent(s): 7c82504

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -25
README.md CHANGED
@@ -6,7 +6,6 @@ pipeline_tag: text-to-audio
6
  tags:
7
  - music_generation
8
  ---
9
-
10
  [//]: # (# InspireMusic)
11
  <p align="center">
12
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
@@ -69,7 +68,7 @@ tags:
69
  [//]: # ( <img alt="Discussion posts" src="https://img.shields.io/github/discussions/FunAudioLLM/InspireMusic?labelColor=%20%239b8afb&color=%20%237a5af8"></a>)
70
  </p>
71
 
72
- InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
73
 
74
  ![GitHub Repo stars](https://img.shields.io/github/stars/FunAudioLLM/InspireMusic) Please support our community project πŸ’– by starring it on GitHub εŠ β­ζ”―ζŒ πŸ™
75
 
@@ -79,23 +78,23 @@ InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio g
79
  **InspireMusic** focuses on music generation, song generation and audio generation.
80
  - A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
81
  - Support text-to-music, music continuation, audio super-resolution, audio reconstruction tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
82
- - Support long audio generation.
83
- - Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
84
 
85
  <a name="What's News"></a>
86
  ## What's New πŸ”₯
87
 
88
- - 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation.
89
  - 2024/12: Support to generate 48kHz audio with super resolution flow matching.
90
  - 2024/11: Welcome to preview πŸ‘‰πŸ» [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) πŸ‘ˆπŸ». We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
91
  - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
92
 
93
  ## Introduction
94
  > [!Note]
95
- > This repo contains the algorithm infrastructure and some simple examples.
96
 
97
  > [!Tip]
98
- > To explore the performance, please refer to [InspireMusic Demo Page](https://iris2c.github.io/InspireMusic). We will open-source InspireMusic models and HuggingFace Space soon.
99
 
100
  InspireMusic is a unified music, song and audio generation framework through the audio tokenization and detokenization process integrated with a large autoregressive transformer. The original motive of this toolkit is to empower the common users to innovate soundscapes and enhance euphony in research through music, song, and audio crafting. The toolkit provides both inference and training code for AI generative models that create high-quality music. Featuring a unified framework, InspireMusic incorporates autoregressive Transformer and conditional flow-matching modeling (CFM), allowing for the controllable generation of music, songs, and audio with both textual and structural music conditioning, as well as neural audio tokenizers. Currently, the toolkit supports text-to-music generation and plans to expand its capabilities to include text-to-song and text-to-audio generation in the future.
101
 
@@ -112,7 +111,7 @@ git submodule update --init --recursive
112
  ```
113
 
114
  ### Install
115
- InspireMusic requires Python 3.8, PyTorch 2.1.0. To install InspireMusic, you can run one of the following:
116
 
117
  - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
118
  - Create Conda env:
@@ -126,6 +125,7 @@ pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --tru
126
  # install flash attention to speedup training
127
  pip install flash-attn --no-build-isolation
128
  ```
 
129
 
130
  - Install within the package:
131
  ```sh
@@ -150,6 +150,81 @@ sudo apt-get install ffmpeg
150
  sudo yum install ffmpeg
151
  ```
152
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
  ## Models
154
  ### Download Model
155
 
@@ -167,28 +242,23 @@ git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_mo
167
  Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
168
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
169
 
170
- | Model name | Model Links | Remarks |
171
- |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
172
- | InspireMusic-Base-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz) | Pre-trained Music Generation Model, 24kHz mono |
173
- | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 48kHz |
174
- | InspireMusic-1.5B-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono |
175
- | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B) | Pre-trained Music Generation 1.5B Model, 48kHz |
176
- | InspireMusic-1.5B-Long | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long) | Pre-trained Music Generation 1.5B Model, 48kHz, support long audio |
177
- | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 48kHz stereo |
178
- | InspireAudio-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Audio Generation 1.5B Model, 48kHz stereo |
 
 
 
179
 
180
  ## Basic Usage
181
 
182
  At the moment, InspireMusic contains the training code and inference code for [music generation](https://github.com/FunAudioLLM/InspireMusic/tree/main/examples/music_generation). More tasks such as song generation and audio generation will be supported in future.
183
 
184
- ### Quick Start
185
-
186
- Here is a quick start running script to do music generation task including data preparation pipeline, model training, inference.
187
- ``` sh
188
- cd InspireMusic/examples/music_generation/
189
- bash run.sh
190
- ```
191
-
192
  ### Training
193
 
194
  Here is an example to train LLM model, support FP16 training.
 
6
  tags:
7
  - music_generation
8
  ---
 
9
  [//]: # (# InspireMusic)
10
  <p align="center">
11
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
 
68
  [//]: # ( <img alt="Discussion posts" src="https://img.shields.io/github/discussions/FunAudioLLM/InspireMusic?labelColor=%20%239b8afb&color=%20%237a5af8"></a>)
69
  </p>
70
 
71
+ InspireMusic is a fundamental AIGC toolkit and models designed for music, song, and audio generation using PyTorch.
72
 
73
  ![GitHub Repo stars](https://img.shields.io/github/stars/FunAudioLLM/InspireMusic) Please support our community project πŸ’– by starring it on GitHub εŠ β­ζ”―ζŒ πŸ™
74
 
 
78
  **InspireMusic** focuses on music generation, song generation and audio generation.
79
  - A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
80
  - Support text-to-music, music continuation, audio super-resolution, audio reconstruction tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
81
+ - Support long audio generation in multiple output audio formats, i.e., wav, flac, mp3, m4a.
82
+ - Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily fine-tune their music generation models.
83
 
84
  <a name="What's News"></a>
85
  ## What's New πŸ”₯
86
 
87
+ - 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation. Models are available on both ModelScope and HuggingFace.
88
  - 2024/12: Support to generate 48kHz audio with super resolution flow matching.
89
  - 2024/11: Welcome to preview πŸ‘‰πŸ» [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) πŸ‘ˆπŸ». We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
90
  - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
91
 
92
  ## Introduction
93
  > [!Note]
94
+ > This repo contains the algorithm infrastructure and some simple examples. Currently only support English text prompts.
95
 
96
  > [!Tip]
97
+ > To explore the performance, please refer to [InspireMusic Demo Page](https://iris2c.github.io/InspireMusic). We will open-source better & larger models and demo space soon.
98
 
99
  InspireMusic is a unified music, song and audio generation framework through the audio tokenization and detokenization process integrated with a large autoregressive transformer. The original motive of this toolkit is to empower the common users to innovate soundscapes and enhance euphony in research through music, song, and audio crafting. The toolkit provides both inference and training code for AI generative models that create high-quality music. Featuring a unified framework, InspireMusic incorporates autoregressive Transformer and conditional flow-matching modeling (CFM), allowing for the controllable generation of music, songs, and audio with both textual and structural music conditioning, as well as neural audio tokenizers. Currently, the toolkit supports text-to-music generation and plans to expand its capabilities to include text-to-song and text-to-audio generation in the future.
100
 
 
111
  ```
112
 
113
  ### Install
114
+ InspireMusic requires Python 3.8, PyTorch 2.0.1. To install InspireMusic, you can run one of the following:
115
 
116
  - Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
117
  - Create Conda env:
 
125
  # install flash attention to speedup training
126
  pip install flash-attn --no-build-isolation
127
  ```
128
+ Currently support on CUDA Version 11.x.
129
 
130
  - Install within the package:
131
  ```sh
 
150
  sudo yum install ffmpeg
151
  ```
152
 
153
+ ### Quick Start
154
+
155
+ Here is a quick example inference script for music generation.
156
+ ``` sh
157
+ cd InspireMusic
158
+ mkdir -p pretrained_models
159
+
160
+ # Download models
161
+ # ModelScope
162
+ git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic-1.5B-Long
163
+ # HuggingFace
164
+ git clone https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic-1.5B-Long
165
+
166
+ cd examples/music_generation
167
+ # run a quick inference example
168
+ bash infer_1.5b_long.sh
169
+ ```
170
+
171
+ Here is a quick start running script to run music generation task including data preparation pipeline, model training, inference.
172
+ ``` sh
173
+ cd InspireMusic/examples/music_generation/
174
+ bash run.sh
175
+ ```
176
+
177
+ ### One-line Inference
178
+ #### Text-to-music Task
179
+
180
+ One-line Shell script for text-to-music task.
181
+ ``` sh
182
+ cd examples/music_generation
183
+ # with flow matching
184
+ # use one-line command to get a quick try
185
+ python -m inspiremusic.cli.inference
186
+
187
+ # custom the config like the following one-line command
188
+ python -m inspiremusic.cli.inference --task text-to-music -m "InspireMusic-1.5B-Long" -g 0 -t "Experience soothing and sensual instrumental jazz with a touch of Bossa Nova, perfect for a relaxing restaurant or spa ambiance." -c intro -s 0.0 -e 30.0 -r "exp/inspiremusic" -o output -f wav
189
+
190
+ # without flow matching
191
+ python -m inspiremusic.cli.inference --task text-to-music -g 0 -t "Experience soothing and sensual instrumental jazz with a touch of Bossa Nova, perfect for a relaxing restaurant or spa ambiance." --fast True
192
+ ```
193
+
194
+ Alternatively, you can run the inference with just a few lines of Python code.
195
+ ```python
196
+ from inspiremusic.cli.inference import InspireMusicUnified
197
+ from inspiremusic.cli.inference import set_env_variables
198
+ if __name__ == "__main__":
199
+ set_env_variables()
200
+ model = InspireMusicUnified(model_name = "InspireMusic-1.5B-Long")
201
+ model.inference("text-to-music", "Experience soothing and sensual instrumental jazz with a touch of Bossa Nova, perfect for a relaxing restaurant or spa ambiance.")
202
+ ```
203
+
204
+ #### Music Continuation Task
205
+
206
+ One-line Shell script for music continuation task.
207
+ ``` sh
208
+ cd examples/music_generation
209
+ # with flow matching
210
+ python -m inspiremusic.cli.inference --task continuation -g 0 -a audio_prompt.wav
211
+ # without flow matching
212
+ python -m inspiremusic.cli.inference --task continuation -g 0 -a audio_prompt.wav --fast True
213
+ ```
214
+
215
+ Alternatively, you can run the inference with just a few lines of Python code.
216
+ ```python
217
+ from inspiremusic.cli.inference import InspireMusicUnified
218
+ from inspiremusic.cli.inference import set_env_variables
219
+ if __name__ == "__main__":
220
+ set_env_variables()
221
+ model = InspireMusicUnified(model_name = "InspireMusic-1.5B-Long")
222
+ # just use audio prompt
223
+ model.inference("continuation", None, "audio_prompt.wav")
224
+ # use both text prompt and audio prompt
225
+ model.inference("continuation", "Continue to generate jazz music.", "audio_prompt.wav")
226
+ ```
227
+
228
  ## Models
229
  ### Download Model
230
 
 
242
  Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
243
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
244
 
245
+ | Model name | Model Links | Remarks |
246
+ |---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
247
+ | InspireMusic-Base-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz) | Pre-trained Music Generation Model, 24kHz mono, 30s |
248
+ | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 48kHz, 30s |
249
+ | InspireMusic-1.5B-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono, 30s |
250
+ | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B) | Pre-trained Music Generation 1.5B Model, 48kHz, 30s |
251
+ | InspireMusic-1.5B-Long ⭐ | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long) | Pre-trained Music Generation 1.5B Model, 48kHz, support long-form music generation |
252
+ | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 48kHz stereo |
253
+ | InspireAudio-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Audio Generation 1.5B Model, 48kHz stereo |
254
+ | Wavtokenizer[<sup>[1]</sup>](https://openreview.net/forum?id=yBlVlS2Fd9) (75Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/file/view/master?fileName=wavtokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long/tree/main/wavtokenizer) | An extreme low bitrate audio tokenizer for music with one codebook at 24kHz audio. |
255
+ | Music_tokenizer (75Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/file/view/master?fileName=music_tokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz/tree/main/music_tokenizer) | A music tokenizer based on HifiCodec<sup>[2]</sup> at 24kHz audio. |
256
+ | Music_tokenizer (150Hz) | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/file/view/master?fileName=music_tokenizer%252Fmodel.pt) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long/tree/main/music_tokenizer) | A music tokenizer based on HifiCodec at 48kHz audio. |
257
 
258
  ## Basic Usage
259
 
260
  At the moment, InspireMusic contains the training code and inference code for [music generation](https://github.com/FunAudioLLM/InspireMusic/tree/main/examples/music_generation). More tasks such as song generation and audio generation will be supported in future.
261
 
 
 
 
 
 
 
 
 
262
  ### Training
263
 
264
  Here is an example to train LLM model, support FP16 training.