pgptlformer-tinystories / dyn_qkrmsnorm_ii-7a038ecd-be98-46cb-abe8-e0f013fd7eed.txt
SQCU's picture
sling the illustrious and mysterious "attention_II" models. also some layerwise rmsnorm, qkprojection rmsnorm models, one twice as large as the other.
1f45909 verified
raw
history
474 kB
File too large to display, you can check the raw version instead.