quintic commited on
Commit
e4b7718
·
1 Parent(s): 72dcfb1

checkpoint

Browse files
README.md CHANGED
@@ -1,3 +1,11 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ Data: c4 and codeparrot, about 1:1 sample-wise but 1:4 token-wise mix. Significantly biased for codes (python, go, java, javascript, c, c++).
6
+ First round trained for slightly less than 1 epoch (crashed). Loaded the checkpoint and trained for 1000 steps. Loaded again and trained for a whole epoch. Dataloader does not shuffle. It is not suitable for ablation as it is not carefully controlled on the number of times the sae sees a sample.
7
+
8
+ Params:
9
+ - batch size 64 * 2048 * 8 = 1048576 tokens
10
+ - lr automatically according to EAI sae codebase
11
+ - auxk_alpha 0.03
layers.23/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"expansion_factor": 32, "normalize_decoder": true, "num_latents": 0, "k": 192, "signed": false, "d_in": 4096}
layers.23/sae.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e3e6ab2115389260c1c9e8381756ecf0a83628ac1c41ee282210ce8b80b926d
3
+ size 4295508312
layers.29/cfg.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"expansion_factor": 32, "normalize_decoder": true, "num_latents": 0, "k": 192, "signed": false, "d_in": 4096}
layers.29/sae.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7895ce7aa1c4cbdedf12477250a1f195e1561eb1ece31e46e0491cb38fc5386
3
+ size 4295508312