pgptlformer-tinystories / re-pqt-rmsXrms-ATTNII-697f0113-bb05-480b-b6dc-42a97de0de3e
SQCU's picture
sling the illustrious and mysterious "attention_II" models. also some layerwise rmsnorm, qkprojection rmsnorm models, one twice as large as the other.
1f45909 verified