Some very small and very simple models.

29,960,200 parameters.

"dim":256,"dim_head":32,"headcount":8,"ff_mult":4, "vocab_size":50304, "num_layers":4.

this is nonstandard (for tinystories), reflecting a full gpt-2 vocabulary size (bloating the embedding layers), and the use of a swiglu activation function, (which doubles the width of one of the feedforward layers).

training, inference, dataset preparation, and network definitions source available at https://github.com/SQCU/attn_demo

training logs

(unprocessed! unfiltered! it's a bunch of log prints of train and validation loss!)

and training loader source for each run included with the demo models.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train SQCU/pgptlformer-tinystories