Commit History

compiled models train faster so you can train more of them in a short experiment, to better convergence.
921107d
verified

SQCU commited on

89,301,000 parameter attention_ii, z_lossed model trained for 6250 steps at batchsize:4*32, device_batchsize:32
8a69386
verified

SQCU commited on

sling the illustrious and mysterious "attention_II" models. also some layerwise rmsnorm, qkprojection rmsnorm models, one twice as large as the other.
1f45909
verified

SQCU commited on

Upload 8 files
6d543db
verified

SQCU commited on

Update README.md
87045f5
verified

SQCU commited on

Create README.md
fd3ca39
verified

SQCU commited on

initial commit
5e8f667
verified

SQCU commited on