pgptlformer-tinystories / re-pqt-rmsXrms-2x-42e14b65-2277-45ae-a68c-822eb66be09a
SQCU's picture
sling the illustrious and mysterious "attention_II" models. also some layerwise rmsnorm, qkprojection rmsnorm models, one twice as large as the other.
1f45909 verified