fla-hub
/

rwkv7-2.9B-world

Text Generation

Model card Files Files and versions Community

ZhangRC commited on 5 days ago

Commit

df87ec2

·

verified ·

1 Parent(s): 1da204a

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -65,6 +65,16 @@ model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-2.9B-world', trust_r
 tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-2.9B-world', trust_remote_code=True)
 ```
 ## FAQ
 Q: safetensors metadata is none.

 tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-2.9B-world', trust_remote_code=True)
 ```
+### Training Data
+This model is trained on the World v3 with a total of 3.119 trillion tokens.
+#### Training Hyperparameters
+- **Training regime:** bfloat16, lr 4e-4 to 1e-5 "delayed" cosine decay, wd 0.1 (with increasing batch sizes during the middle)
+- **Final Loss:** 1.8745
+- **Token Count:** 3.119 trillion
 ## FAQ
 Q: safetensors metadata is none.