SQCU's picture
Update README.md
87045f5 verified
|
raw
history blame
712 Bytes
---
datasets:
- roneneldan/TinyStories
---
Some very small and very simple models.
29,960,200 parameters.
"dim":256,"dim_head":32,"headcount":8,"ff_mult":4,
"vocab_size":50304, "num_layers":4.
this is nonstandard (for tinystories),
reflecting a full gpt-2 vocabulary size (bloating the embedding layers),
and the use of a swiglu activation function,
(which doubles the width of one of the feedforward layers).
training, inference, dataset preparation, and network definitions source available at
https://github.com/SQCU/attn_demo
training logs
(unprocessed! unfiltered! it's a bunch of log prints of train and validation loss!)
and training loader source for each run included with the demo models.