Blackroot commited on
Commit
dc8dae9
·
verified ·
1 Parent(s): 4348346

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -1,4 +1,4 @@
1
- From scratch pretraining on english only no synthetic data, no code, 3 epochs of 1 gig of data for the ~125M param model.
2
 
3
  Test network using [Differential Transformer (Attention)](https://arxiv.org/abs/2410.05258). Other than some alterations to the attention, such as 16 heads insted of 9 and using differential attn, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
4
 
 
1
+ From scratch pretraining on english only no synthetic data, no code, 3 epochs of 1 gig of data for the ~135M param model.
2
 
3
  Test network using [Differential Transformer (Attention)](https://arxiv.org/abs/2410.05258). Other than some alterations to the attention, such as 16 heads insted of 9 and using differential attn, this is the same setup as https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct
4