usgpt-ft

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct-1M on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2
num_epochs: 50
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
11.8582	1.0	4	11.9321
11.857	2.0	8	11.9321
11.8625	3.0	12	11.9321
11.8615	4.0	16	11.9321
11.8576	5.0	20	11.9321
11.8603	6.0	24	11.9321
11.8671	7.0	28	11.9321
11.8587	8.0	32	11.9321
11.8577	9.0	36	11.9321
11.8578	10.0	40	11.9321
11.866	11.0	44	11.9321
11.8586	12.0	48	11.9321
11.8643	13.0	52	11.9321
11.8563	14.0	56	11.9321
11.8659	15.0	60	11.9321
11.8603	16.0	64	11.9321
11.8641	17.0	68	11.9321
11.8656	18.0	72	11.9321
11.8591	19.0	76	11.9321
11.8576	20.0	80	11.9321
11.8685	21.0	84	11.9321
11.8668	22.0	88	11.9321
11.8662	23.0	92	11.9321
11.8658	24.0	96	11.9321
11.869	25.0	100	11.9321
11.8656	26.0	104	11.9321
11.8581	27.0	108	11.9321
11.8575	28.0	112	11.9321
11.8587	29.0	116	11.9321
11.8571	30.0	120	11.9321
11.8612	31.0	124	11.9321
11.8662	32.0	128	11.9321
11.8636	33.0	132	11.9321
11.8593	34.0	136	11.9321
11.8571	35.0	140	11.9321
11.8644	36.0	144	11.9321
11.8594	37.0	148	11.9321
11.8645	37.6154	150	11.9321