CLIPtion is a fast and small captioning extension to OpenAI CLIP ViT-L/14. You already have ViT-L loaded when using using Stable Diffusion, SDXL, SD3, FLUX, etc and with just an extra 100MB memory you can include caption/prompt generation in your workflows!
I made this for fun and am sure bigger dedicated caption models and VLM's will give you more accurate captioning, but this guy is tiny, fast, reuses what you already have loaded, and has options to give better CLIP alignment so give it a try if you like!
Big thanks to Ben Egan, SilentAntagonist, Alex Redden, XWAVE, and Jacky-hate whose synthetic caption datasets I included in the training.
Use this model in ComfyUI with the comfy-cliption extension!
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.