checkpoint
Browse files- README.md +8 -0
- layers.23/cfg.json +1 -0
- layers.23/sae.safetensors +3 -0
- layers.29/cfg.json +1 -0
- layers.29/sae.safetensors +3 -0
README.md
CHANGED
@@ -1,3 +1,11 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
|
5 |
+
Data: c4 and codeparrot, about 1:1 sample-wise but 1:4 token-wise mix. Significantly biased for codes (python, go, java, javascript, c, c++).
|
6 |
+
First round trained for slightly less than 1 epoch (crashed). Loaded the checkpoint and trained for 1000 steps. Loaded again and trained for a whole epoch. Dataloader does not shuffle. It is not suitable for ablation as it is not carefully controlled on the number of times the sae sees a sample.
|
7 |
+
|
8 |
+
Params:
|
9 |
+
- batch size 64 * 2048 * 8 = 1048576 tokens
|
10 |
+
- lr automatically according to EAI sae codebase
|
11 |
+
- auxk_alpha 0.03
|
layers.23/cfg.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"expansion_factor": 32, "normalize_decoder": true, "num_latents": 0, "k": 192, "signed": false, "d_in": 4096}
|
layers.23/sae.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1e3e6ab2115389260c1c9e8381756ecf0a83628ac1c41ee282210ce8b80b926d
|
3 |
+
size 4295508312
|
layers.29/cfg.json
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
{"expansion_factor": 32, "normalize_decoder": true, "num_latents": 0, "k": 192, "signed": false, "d_in": 4096}
|
layers.29/sae.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c7895ce7aa1c4cbdedf12477250a1f195e1561eb1ece31e46e0491cb38fc5386
|
3 |
+
size 4295508312
|