SQCU's picture
89,301,000 parameter attention_ii, z_lossed model trained for 6250 steps at batchsize:4*32, device_batchsize:32
8a69386 verified