e25a7a5a-e741-4089-a8be-57476f44fd43

This model is a fine-tuned version of EleutherAI/pythia-14m on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000203
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0002	1	6.0929
12.0253	0.0085	50	5.9166
11.1547	0.0169	100	5.4564
10.769	0.0254	150	5.3306
10.3339	0.0338	200	5.1668
10.4068	0.0423	250	5.2065
10.6251	0.0507	300	5.1835
10.2472	0.0592	350	5.0995
10.2065	0.0676	400	5.0638
10.1109	0.0761	450	5.0482
10.1466	0.0845	500	5.0439