mc4-xglm

This model is a fine-tuned version of bowphs/xglm-163M on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.5428
  • Accuracy: 0.3037

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 1e-05 1 12.4156 0.0013
No log 2e-05 2 12.3235 0.0060
No log 4e-05 4 12.1857 0.0306
No log 8e-05 8 11.9996 0.0309
No log 0.0002 16 11.7765 0.0309
No log 0.0003 32 11.4468 0.0308
No log 0.0006 64 10.8073 0.0309
No log 0.0013 128 9.9102 0.0309
No log 0.0026 256 9.4987 0.0309
No log 0.0051 512 9.3858 0.0349
No log 0.0102 1024 8.9732 0.0479
18.122 0.02 2000 8.0735 0.0620
18.122 0.0205 2048 8.0382 0.0638
15.3151 0.04 4000 7.3037 0.0817
15.3151 0.0410 4096 7.2835 0.0829
14.3107 0.06 6000 6.9496 0.0976
13.7359 0.08 8000 6.7178 0.1090
13.7359 0.0819 8192 6.6886 0.1104
13.2885 0.1 10000 6.5164 0.1215
12.9361 0.12 12000 6.3166 0.1336
12.5939 0.14 14000 6.1048 0.1493
12.253 0.16 16000 5.8658 0.1724
12.253 0.1638 16384 5.8180 0.1776
11.8906 0.18 18000 5.6544 0.1970
11.5166 0.2 20000 5.4889 0.2206
11.2881 0.22 22000 5.3584 0.2359
11.0193 0.24 24000 5.2659 0.2464
10.8682 0.26 26000 5.1792 0.2534
10.7094 0.28 28000 5.1471 0.2582
10.5669 0.3 30000 5.0662 0.2634
10.5185 0.32 32000 5.0210 0.2669
10.5185 0.3277 32768 5.0017 0.2681
10.3391 0.34 34000 4.9683 0.2703
10.2732 0.36 36000 4.9500 0.2732
10.1715 0.38 38000 4.8979 0.2752
10.0964 0.4 40000 4.8797 0.2774
10.1037 0.42 42000 4.8533 0.2796
10.0059 0.44 44000 4.8232 0.2818
9.9808 0.46 46000 4.8019 0.2835
9.926 0.48 48000 4.7786 0.2847
9.9082 0.5 50000 4.7522 0.2863
9.9196 0.52 52000 4.7326 0.2884
9.7817 0.54 54000 4.7199 0.2891
9.7752 0.56 56000 4.7071 0.2906
9.7804 0.58 58000 4.7027 0.2917
9.7381 0.6 60000 4.6763 0.2928
9.7158 0.62 62000 4.6655 0.2936
9.6877 0.64 64000 4.6539 0.2945
9.6877 0.6554 65536 4.6395 0.2953
9.6462 0.66 66000 4.6372 0.2956
9.6217 0.68 68000 4.6275 0.2966
9.603 0.7 70000 4.6255 0.2974
9.5765 0.72 72000 4.6124 0.2984
9.527 0.74 74000 4.6008 0.2991
9.5341 0.76 76000 4.5958 0.2993
9.5222 0.78 78000 4.5945 0.2999
9.4908 0.8 80000 4.5835 0.3005
9.5762 0.82 82000 4.5711 0.3012
9.5471 0.84 84000 4.5669 0.3015
9.4683 0.86 86000 4.5638 0.3020
9.4868 0.88 88000 4.5552 0.3024
9.4403 0.9 90000 4.5550 0.3028
9.4591 0.92 92000 4.5494 0.3031
9.4549 0.94 94000 4.5476 0.3033
9.426 0.96 96000 4.5445 0.3034
9.4267 0.98 98000 4.5457 0.3036
9.4388 1.0 100000 4.5428 0.3037

Framework versions

  • Transformers 4.48.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2,150
Safetensors
Model size
163M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.