usgpt-ft

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct-1M on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 11.9321

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.PAGED_ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • num_epochs: 50
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
11.8582 1.0 4 11.9321
11.857 2.0 8 11.9321
11.8625 3.0 12 11.9321
11.8615 4.0 16 11.9321
11.8576 5.0 20 11.9321
11.8603 6.0 24 11.9321
11.8671 7.0 28 11.9321
11.8587 8.0 32 11.9321
11.8577 9.0 36 11.9321
11.8578 10.0 40 11.9321
11.866 11.0 44 11.9321
11.8586 12.0 48 11.9321
11.8643 13.0 52 11.9321
11.8563 14.0 56 11.9321
11.8659 15.0 60 11.9321
11.8603 16.0 64 11.9321
11.8641 17.0 68 11.9321
11.8656 18.0 72 11.9321
11.8591 19.0 76 11.9321
11.8576 20.0 80 11.9321
11.8685 21.0 84 11.9321
11.8668 22.0 88 11.9321
11.8662 23.0 92 11.9321
11.8658 24.0 96 11.9321
11.869 25.0 100 11.9321
11.8656 26.0 104 11.9321
11.8581 27.0 108 11.9321
11.8575 28.0 112 11.9321
11.8587 29.0 116 11.9321
11.8571 30.0 120 11.9321
11.8612 31.0 124 11.9321
11.8662 32.0 128 11.9321
11.8636 33.0 132 11.9321
11.8593 34.0 136 11.9321
11.8571 35.0 140 11.9321
11.8644 36.0 144 11.9321
11.8594 37.0 148 11.9321
11.8645 37.6154 150 11.9321

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model’s pipeline type.

Model tree for musr/usgpt-ft

Base model

Qwen/Qwen2.5-7B
Adapter
(1)
this model