quintic
/

deepseek_coder_6.7b-sae-k-192-layer-23-29-2_ep

Model card Files Files and versions Community

quintic commited on Aug 18, 2024

Commit

e4b7718

·

1 Parent(s): 72dcfb1

checkpoint

Files changed (5) hide show

README.md +8 -0
layers.23/cfg.json +1 -0
layers.23/sae.safetensors +3 -0
layers.29/cfg.json +1 -0
layers.29/sae.safetensors +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,11 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+Data: c4 and codeparrot, about 1:1 sample-wise but 1:4 token-wise mix. Significantly biased for codes (python, go, java, javascript, c, c++).
+First round trained for slightly less than 1 epoch (crashed). Loaded the checkpoint and trained for 1000 steps. Loaded again and trained for a whole epoch. Dataloader does not shuffle. It is not suitable for ablation as it is not carefully controlled on the number of times the sae sees a sample.
+Params:
+- batch size 64 * 2048 * 8 = 1048576 tokens
+- lr automatically according to EAI sae codebase
+- auxk_alpha 0.03

layers.23/cfg.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"expansion_factor": 32, "normalize_decoder": true, "num_latents": 0, "k": 192, "signed": false, "d_in": 4096}

layers.23/sae.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1e3e6ab2115389260c1c9e8381756ecf0a83628ac1c41ee282210ce8b80b926d
+size 4295508312

layers.29/cfg.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"expansion_factor": 32, "normalize_decoder": true, "num_latents": 0, "k": 192, "signed": false, "d_in": 4096}

layers.29/sae.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7895ce7aa1c4cbdedf12477250a1f195e1561eb1ece31e46e0491cb38fc5386
+size 4295508312