mc4-xglm
This model is a fine-tuned version of bowphs/xglm-163M on the None dataset. It achieves the following results on the evaluation set:
- Loss: 4.5428
- Accuracy: 0.3037
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 100000
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
No log | 1e-05 | 1 | 12.4156 | 0.0013 |
No log | 2e-05 | 2 | 12.3235 | 0.0060 |
No log | 4e-05 | 4 | 12.1857 | 0.0306 |
No log | 8e-05 | 8 | 11.9996 | 0.0309 |
No log | 0.0002 | 16 | 11.7765 | 0.0309 |
No log | 0.0003 | 32 | 11.4468 | 0.0308 |
No log | 0.0006 | 64 | 10.8073 | 0.0309 |
No log | 0.0013 | 128 | 9.9102 | 0.0309 |
No log | 0.0026 | 256 | 9.4987 | 0.0309 |
No log | 0.0051 | 512 | 9.3858 | 0.0349 |
No log | 0.0102 | 1024 | 8.9732 | 0.0479 |
18.122 | 0.02 | 2000 | 8.0735 | 0.0620 |
18.122 | 0.0205 | 2048 | 8.0382 | 0.0638 |
15.3151 | 0.04 | 4000 | 7.3037 | 0.0817 |
15.3151 | 0.0410 | 4096 | 7.2835 | 0.0829 |
14.3107 | 0.06 | 6000 | 6.9496 | 0.0976 |
13.7359 | 0.08 | 8000 | 6.7178 | 0.1090 |
13.7359 | 0.0819 | 8192 | 6.6886 | 0.1104 |
13.2885 | 0.1 | 10000 | 6.5164 | 0.1215 |
12.9361 | 0.12 | 12000 | 6.3166 | 0.1336 |
12.5939 | 0.14 | 14000 | 6.1048 | 0.1493 |
12.253 | 0.16 | 16000 | 5.8658 | 0.1724 |
12.253 | 0.1638 | 16384 | 5.8180 | 0.1776 |
11.8906 | 0.18 | 18000 | 5.6544 | 0.1970 |
11.5166 | 0.2 | 20000 | 5.4889 | 0.2206 |
11.2881 | 0.22 | 22000 | 5.3584 | 0.2359 |
11.0193 | 0.24 | 24000 | 5.2659 | 0.2464 |
10.8682 | 0.26 | 26000 | 5.1792 | 0.2534 |
10.7094 | 0.28 | 28000 | 5.1471 | 0.2582 |
10.5669 | 0.3 | 30000 | 5.0662 | 0.2634 |
10.5185 | 0.32 | 32000 | 5.0210 | 0.2669 |
10.5185 | 0.3277 | 32768 | 5.0017 | 0.2681 |
10.3391 | 0.34 | 34000 | 4.9683 | 0.2703 |
10.2732 | 0.36 | 36000 | 4.9500 | 0.2732 |
10.1715 | 0.38 | 38000 | 4.8979 | 0.2752 |
10.0964 | 0.4 | 40000 | 4.8797 | 0.2774 |
10.1037 | 0.42 | 42000 | 4.8533 | 0.2796 |
10.0059 | 0.44 | 44000 | 4.8232 | 0.2818 |
9.9808 | 0.46 | 46000 | 4.8019 | 0.2835 |
9.926 | 0.48 | 48000 | 4.7786 | 0.2847 |
9.9082 | 0.5 | 50000 | 4.7522 | 0.2863 |
9.9196 | 0.52 | 52000 | 4.7326 | 0.2884 |
9.7817 | 0.54 | 54000 | 4.7199 | 0.2891 |
9.7752 | 0.56 | 56000 | 4.7071 | 0.2906 |
9.7804 | 0.58 | 58000 | 4.7027 | 0.2917 |
9.7381 | 0.6 | 60000 | 4.6763 | 0.2928 |
9.7158 | 0.62 | 62000 | 4.6655 | 0.2936 |
9.6877 | 0.64 | 64000 | 4.6539 | 0.2945 |
9.6877 | 0.6554 | 65536 | 4.6395 | 0.2953 |
9.6462 | 0.66 | 66000 | 4.6372 | 0.2956 |
9.6217 | 0.68 | 68000 | 4.6275 | 0.2966 |
9.603 | 0.7 | 70000 | 4.6255 | 0.2974 |
9.5765 | 0.72 | 72000 | 4.6124 | 0.2984 |
9.527 | 0.74 | 74000 | 4.6008 | 0.2991 |
9.5341 | 0.76 | 76000 | 4.5958 | 0.2993 |
9.5222 | 0.78 | 78000 | 4.5945 | 0.2999 |
9.4908 | 0.8 | 80000 | 4.5835 | 0.3005 |
9.5762 | 0.82 | 82000 | 4.5711 | 0.3012 |
9.5471 | 0.84 | 84000 | 4.5669 | 0.3015 |
9.4683 | 0.86 | 86000 | 4.5638 | 0.3020 |
9.4868 | 0.88 | 88000 | 4.5552 | 0.3024 |
9.4403 | 0.9 | 90000 | 4.5550 | 0.3028 |
9.4591 | 0.92 | 92000 | 4.5494 | 0.3031 |
9.4549 | 0.94 | 94000 | 4.5476 | 0.3033 |
9.426 | 0.96 | 96000 | 4.5445 | 0.3034 |
9.4267 | 0.98 | 98000 | 4.5457 | 0.3036 |
9.4388 | 1.0 | 100000 | 4.5428 | 0.3037 |
Framework versions
- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 2,150
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.