|
--- |
|
tags: |
|
- stable-diffusion |
|
- stable-diffusion-diffusers |
|
- text-to-image |
|
datasets: |
|
- yashvoladoddi37/kanjienglish |
|
language: |
|
- en |
|
- ja |
|
library_name: diffusers |
|
pipeline_tag: text-to-image |
|
--- |
|
|
|
# Kanji Diffusion v1-4 Model Card |
|
|
|
Kanji Diffusion is a latent text-to-image diffusion model capable of hallucinating Kanji characters given any English prompt. |
|
|
|
## Fine-tuned Model Details |
|
- **Developed by:** Yashpreet Voladoddi |
|
- **Model type:** Diffusion-based text-to-image generation model, fine-tuned on Stable Diffusion v1.4 model. |
|
|
|
### Colab |
|
In order to run the pipeline and see how my model generates the kanji characters, follow the code flow below on Colab(on T4 GPU runtime, else it takes a long time to infer each image). |
|
Make sure you have your Huggingface API KEY / ACCESS TOKEN for this. |
|
|
|
```python |
|
import os |
|
from google.colab import drive |
|
drive.mount('/content/drive') |
|
os.chdir("/content/drive/MyDrive") |
|
|
|
!pip install diffusers |
|
!git clone https://github.com/huggingface/diffusers |
|
!huggingface-cli login |
|
|
|
from diffusers import StableDiffusionPipeline |
|
import torch |
|
torch.cuda.empty_cache() |
|
|
|
model_path = "yashvoladoddi37/kanji-diffusion-v1-4" |
|
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16, use_safetensors = True).to("cuda") |
|
pipe.unet.load_attn_procs(model_path) |
|
pipe.to("cuda") |
|
|
|
prompt = "A Kanji meaning baby robot" |
|
image = pipe(prompt).images[0] |
|
image.save("baby-robot-kanji-v1-4.png") |
|
``` |
|
|
|
### Limitations |
|
|
|
## Training |
|
|
|
**Training Data** |
|
|
|
**Hardware:** Nvidia GTX 1650 4GB vRAM | 8GB RAM and T4 GPU on Colab |
|
|
|
**Training Script:** |
|
```python |
|
!accelerate launch train_text_to_image_lora.py \ |
|
--pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \ |
|
--dataset_name="yashvoladoddi37/kanjienglish" \ |
|
--image_column = "image" |
|
--caption_column="text" \ |
|
--resolution=512 \ |
|
--random_flip \ |
|
--train_batch_size=1 \ |
|
--num_train_epochs=1 \ |
|
--checkpointing_steps=500 \ |
|
--learning_rate=1e-04 \ |
|
--lr_scheduler="constant" \ |
|
--lr_warmup_steps=0 \ |
|
--seed=42 \ |
|
--output_dir="kanji-diffusion-v1-4" \ |
|
--validation_prompt="A kanji meaning Elon Musk" \ |
|
--push_to_hub |
|
``` |