|
--- |
|
datasets: |
|
- roneneldan/TinyStories |
|
--- |
|
Some very small and very simple models. |
|
|
|
29,960,200 parameters. |
|
|
|
"dim":256,"dim_head":32,"headcount":8,"ff_mult":4, |
|
"vocab_size":50304, "num_layers":4. |
|
|
|
this is nonstandard (for tinystories), |
|
reflecting a full gpt-2 vocabulary size (bloating the embedding layers), |
|
and the use of a swiglu activation function, |
|
(which doubles the width of one of the feedforward layers). |
|
|
|
|
|
training, inference, dataset preparation, and network definitions source available at |
|
https://github.com/SQCU/attn_demo |
|
|
|
|
|
training logs |
|
|
|
(unprocessed! unfiltered! it's a bunch of log prints of train and validation loss!) |
|
|
|
and training loader source for each run included with the demo models. |