|
--- |
|
license: other |
|
license_name: stabilityai-ai-community |
|
license_link: LICENSE.md |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
- diffusers |
|
inference: true |
|
language: |
|
- en |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
# Stable Diffusion 3.5 Large BF16 |
|
![3.5 Large Demo Image](sd3.5_large_demo.png) |
|
|
|
## Model |
|
|
|
![MMDiT](mmdit.png) |
|
|
|
|
|
[Stable Diffusion 3.5 Large](https://stability.ai/news/introducing-stable-diffusion-3-5) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. |
|
|
|
Please note: This model is released under the [Stability Community License](https://stability.ai/community-license-agreement). Visit [Stability AI](https://stability.ai/license) to learn or [contact us](https://stability.ai/enterprise) for commercial licensing details. |
|
|
|
|
|
### Model Description |
|
|
|
- **Developed by:** Stability AI |
|
- **Model type:** MMDiT text-to-image generative model |
|
- **Model Description:** This model generates images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206) that use three fixed, pretrained text encoders, and with QK-normalization to improve training stability. |
|
|
|
### License |
|
|
|
- **Community License:** Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the [Community License Agreement](https://stability.ai/community-license-agreement). Read more at https://stability.ai/license. |
|
- **For individuals and organizations with annual revenue above $1M**: please [contact us](https://stability.ai/enterprise) to get an Enterprise License. |
|
|
|
### Model Sources |
|
|
|
For local or self-hosted use, we recommend [ComfyUI](https://github.com/comfyanonymous/ComfyUI) for node-based UI inference, or [diffusers](https://github.com/huggingface/diffusers) or [GitHub](https://github.com/Stability-AI/sd3.5) for programmatic use. |
|
|
|
- **ComfyUI:** [Github](https://github.com/comfyanonymous/ComfyUI), [Example Workflow](https://comfyanonymous.github.io/ComfyUI_examples/sd3/) |
|
- **Huggingface Space:** [Space](https://huggingface.co/spaces/stabilityai/stable-diffusion-3.5-large) |
|
- **Diffusers**: [See below](#using-with-diffusers). |
|
- **GitHub**: [GitHub](https://github.com/Stability-AI/sd3.5). |
|
|
|
- **API Endpoints:** |
|
- [Stability AI API](https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/post) |
|
- [Replicate](https://replicate.com/stability-ai/stable-diffusion-3.5-large) |
|
- [Deepinfra](https://deepinfra.com/stabilityai/sd3.5) |
|
|
|
|
|
### Implementation Details |
|
|
|
- **QK Normalization:** Implements the QK normalization technique to improve training Stability. |
|
|
|
- **Text Encoders:** |
|
- CLIPs: [OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main), context length 77 tokens |
|
- T5: [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl), context length 77/256 tokens at different stages of training |
|
|
|
- **Training Data and Strategy:** |
|
|
|
This model was trained on a wide variety of data, including synthetic data and filtered publicly available data. |
|
|
|
For more technical details of the original MMDiT architecture, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper). |
|
|
|
|
|
### Model Performance |
|
|
|
See [blog](https://stability.ai/news/introducing-stable-diffusion-3-5) for our study about comparative performance in prompt adherence and aesthetic quality. |
|
|
|
## Using with Diffusers |
|
Upgrade to the latest version of the [🧨 diffusers library](https://github.com/huggingface/diffusers) |
|
``` |
|
pip install -U diffusers |
|
``` |
|
|
|
and then you can run |
|
```py |
|
import torch |
|
from diffusers import StableDiffusion3Pipeline |
|
|
|
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16) |
|
pipe = pipe.to("cuda") |
|
|
|
image = pipe( |
|
"A capybara holding a sign that reads Hello World", |
|
num_inference_steps=28, |
|
guidance_scale=3.5, |
|
).images[0] |
|
image.save("capybara.png") |
|
``` |
|
|
|
### Contact |
|
|
|
Please report any issues with the model or contact us: |
|
|
|
* Safety issues: [email protected] |
|
* Security issues: [email protected] |
|
* Privacy issues: [email protected] |
|
* License and general: https://stability.ai/license |
|
* Enterprise license: https://stability.ai/enterprise |
|
|
|
|