mc4-xglm

This model is a fine-tuned version of bowphs/xglm-163M on the None dataset. It achieves the following results on the evaluation set:

Loss: 4.5428
Accuracy: 0.3037

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
training_steps: 100000

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
No log	1e-05	1	12.4156	0.0013
No log	2e-05	2	12.3235	0.0060
No log	4e-05	4	12.1857	0.0306
No log	8e-05	8	11.9996	0.0309
No log	0.0002	16	11.7765	0.0309
No log	0.0003	32	11.4468	0.0308
No log	0.0006	64	10.8073	0.0309
No log	0.0013	128	9.9102	0.0309
No log	0.0026	256	9.4987	0.0309
No log	0.0051	512	9.3858	0.0349
No log	0.0102	1024	8.9732	0.0479
18.122	0.02	2000	8.0735	0.0620
18.122	0.0205	2048	8.0382	0.0638
15.3151	0.04	4000	7.3037	0.0817
15.3151	0.0410	4096	7.2835	0.0829
14.3107	0.06	6000	6.9496	0.0976
13.7359	0.08	8000	6.7178	0.1090
13.7359	0.0819	8192	6.6886	0.1104
13.2885	0.1	10000	6.5164	0.1215
12.9361	0.12	12000	6.3166	0.1336
12.5939	0.14	14000	6.1048	0.1493
12.253	0.16	16000	5.8658	0.1724
12.253	0.1638	16384	5.8180	0.1776
11.8906	0.18	18000	5.6544	0.1970
11.5166	0.2	20000	5.4889	0.2206
11.2881	0.22	22000	5.3584	0.2359
11.0193	0.24	24000	5.2659	0.2464
10.8682	0.26	26000	5.1792	0.2534
10.7094	0.28	28000	5.1471	0.2582
10.5669	0.3	30000	5.0662	0.2634
10.5185	0.32	32000	5.0210	0.2669
10.5185	0.3277	32768	5.0017	0.2681
10.3391	0.34	34000	4.9683	0.2703
10.2732	0.36	36000	4.9500	0.2732
10.1715	0.38	38000	4.8979	0.2752
10.0964	0.4	40000	4.8797	0.2774
10.1037	0.42	42000	4.8533	0.2796
10.0059	0.44	44000	4.8232	0.2818
9.9808	0.46	46000	4.8019	0.2835
9.926	0.48	48000	4.7786	0.2847
9.9082	0.5	50000	4.7522	0.2863
9.9196	0.52	52000	4.7326	0.2884
9.7817	0.54	54000	4.7199	0.2891
9.7752	0.56	56000	4.7071	0.2906
9.7804	0.58	58000	4.7027	0.2917
9.7381	0.6	60000	4.6763	0.2928
9.7158	0.62	62000	4.6655	0.2936
9.6877	0.64	64000	4.6539	0.2945
9.6877	0.6554	65536	4.6395	0.2953
9.6462	0.66	66000	4.6372	0.2956
9.6217	0.68	68000	4.6275	0.2966
9.603	0.7	70000	4.6255	0.2974
9.5765	0.72	72000	4.6124	0.2984
9.527	0.74	74000	4.6008	0.2991
9.5341	0.76	76000	4.5958	0.2993
9.5222	0.78	78000	4.5945	0.2999
9.4908	0.8	80000	4.5835	0.3005
9.5762	0.82	82000	4.5711	0.3012
9.5471	0.84	84000	4.5669	0.3015
9.4683	0.86	86000	4.5638	0.3020
9.4868	0.88	88000	4.5552	0.3024
9.4403	0.9	90000	4.5550	0.3028
9.4591	0.92	92000	4.5494	0.3031
9.4549	0.94	94000	4.5476	0.3033
9.426	0.96	96000	4.5445	0.3034
9.4267	0.98	98000	4.5457	0.3036
9.4388	1.0	100000	4.5428	0.3037

Framework versions

Transformers 4.48.0.dev0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

bowphs
/

mc4-xglm

mc4-xglm

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results