SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders

The repository contains Sparse Autoencoders trained in our work for blocks up.1.1 and up.1.2.

After cloning our GitHub repo, you can use them as follows:

from SAE.sae import Sae

device = "cuda"
hookpoint = "unet.up_blocks.1.attentions.1"

sae = Sae.load_from_hub("bcywinski/SAeUron", hookpoint=hookpoint, device=device)

📚 Bibtex

@article{cywinski2025saeuron,
  title={SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders},
  author={Cywi{\'n}ski, Bartosz and Deja, Kamil},
  journal={arXiv preprint arXiv:2501.18052},
  year={2025}
}