SQCU
/

pgptlformer-tinystories

Model card Files Files and versions Community

pgptlformer-tinystories / README.md

SQCU's picture

Update README.md

87045f5 verified 28 days ago

|

712 Bytes

	---
	datasets:
	- roneneldan/TinyStories
	---
	Some very small and very simple models.

	29,960,200 parameters.

	"dim":256,"dim_head":32,"headcount":8,"ff_mult":4,
	"vocab_size":50304, "num_layers":4.

	this is nonstandard (for tinystories),
	reflecting a full gpt-2 vocabulary size (bloating the embedding layers),
	and the use of a swiglu activation function,
	(which doubles the width of one of the feedforward layers).


	training, inference, dataset preparation, and network definitions source available at
	https://github.com/SQCU/attn_demo


	training logs

	(unprocessed! unfiltered! it's a bunch of log prints of train and validation loss!)

	and training loader source for each run included with the demo models.