Goodfire
/

Llama-3.3-70B-Instruct-SAE-l50

goodfire-llama-3.3-70b-instruct-sae-l50

mechanistic interpretability

sparse autoencoder

Model card Files Files and versions Community

namgoodfire commited on 29 days ago

Commit

562febb

·

verified ·

1 Parent(s): 348ec1b

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -22,12 +22,22 @@ By open-sourcing SAEs for leading open models, especially large-scale
 models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
 Our initial work with these SAEs has revealed promising applications in model steering,
-enhancing jailbreaking safeguards, and interpretable classification methods (docs.goodfire.ai).
 We look forward to seeing how the research community builds upon these
 foundations and uncovers new applications.
 #### Feature labels
 ## How to use
 ```python

 models like Llama 3.3 70B, we aim to accelerate progress in interpretability research.
 Our initial work with these SAEs has revealed promising applications in model steering,
+enhancing jailbreaking safeguards, and interpretable classification methods.
 We look forward to seeing how the research community builds upon these
 foundations and uncovers new applications.
 #### Feature labels
+To explore the feature labels check out the [Goodfire Ember SDK](https://www.goodfire.ai/blog/announcing-goodfire-ember/).
+The SDK provides an intuitive interface for interacting with these
+features, allowing you to investigate how Llama processes information
+and even steer its behavior. Get started with feature
+exploration at [docs.goodfire.ai](https://docs.goodfire.ai) or install directly via:
+```
+pip install goodfire
+```
 ## How to use
 ```python