|
--- |
|
tags: |
|
- text-to-image |
|
- stable-diffusion |
|
--- |
|
|
|
# Control-LoRA Model Card |
|
|
|
|
|
## Introduction |
|
What's better than ControlNets for SDXL? ControlNet... but, more efficient. |
|
|
|
By introducing low-rank parameter efficient fine tuning to control networks, we introduce ***Control-LoRAs***. |
|
|
|
Integrating the strengths of both ControlNet and PEFT, this approach offers a more efficient and compact method to bring model control for a wider variety of consumer GPUs. |
|
|
|
For each model below, you'll find `Rank 256` files (reducing the `~4.7GB` ControlNets to `~738MB`) and experimental, ultra-pruned `Rank 128` files (reducing to `~377MB`). |
|
|
|
### MiDaS and ClipDrop Depth |
|
![canny](samples/depth-sample.jpeg) |
|
|
|
Depth estimation is an image processing technique that determines the distance of objects in a scene, providing a depth map that highlights variations in proximity. |
|
|
|
In the example above, we compare the depth results of MiDaS dpt_beit_large_512 with ClipDrop Depth for portraits, and their subsequent use in Depth Control-LoRa. |
|
|
|
The Control-LoRA utilizes a grayscale depth map for guided generation and has been trained on a diverse range of image concepts and aspect ratios. |
|
|
|
### Canny Edge |
|
![canny](samples/canny-sample.jpeg) |
|
Canny Edge Detection is an image processing technique that identifies abrupt changes in intensity to highlight edges in an image. |
|
|
|
This Control-LoRA uses the edges from an image to guide the generation. |
|
|
|
### Photograph and Sketch Colorizer |
|
![photograph colorizer](samples/colorizer-sample.jpeg) |
|
These two Control-LoRAs can be used to colorize images. |
|
|
|
The first is designed to colorize black and white photographs. |
|
|
|
The second is designed to color in sketches input as a white-on-black image (either hand-drawn, or created with a `SoftEdge_PIDI` model). |
|
|
|
### Revision |
|
![revision](thumbnails/stability-clora-revision-thumbnail.jpeg) |
|
Revision is a novel approach of using images to prompt SDXL. |
|
|
|
It uses pooled CLIP embeddings to produce images conceptually similar to the input. It can be used either in addition, or to replace text prompts. |
|
|
|
Revision also includes a blending function for combining multiple image or text concepts, as either positive or negative prompts. |