Terjman-Large-v2.2-bs-16-lr-0.001-ep-2-wp-0.1-gacc-8-gnm-1.0-mx-512-v2.2

This model is a fine-tuned version of atlasia/Terjman-Large-v1.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.7957
  • Bleu: 22.6721
  • Chrf: 42.5712
  • Ter: 83.0017
  • Gen Len: 9.5671

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Bleu Chrf Ter Gen Len
17.8653 0.0361 100 3.1820 18.704 39.8197 86.1582 9.5329
14.0786 0.0723 200 2.8809 19.9939 41.1843 83.2822 10.6624
13.5627 0.1084 300 2.8958 20.3899 41.2135 83.4647 14.3576
13.2604 0.1446 400 2.9575 20.0224 40.5769 83.3912 11.0141
13.4752 0.1807 500 3.0164 21.1674 40.736 83.1323 9.4341
13.7315 0.2169 600 3.1340 19.0237 38.3741 85.8849 9.3506
13.2278 0.2530 700 3.1809 18.6908 38.2692 87.7134 9.4847
12.8315 0.2892 800 3.1363 19.2517 38.9324 86.5747 9.6235
12.2487 0.3253 900 3.1947 19.0902 39.3336 86.9771 9.4212
11.7382 0.3615 1000 3.1702 19.3707 38.5027 85.2427 9.66
11.2323 0.3976 1100 3.3132 18.521 37.7618 87.9382 11.7424
10.9276 0.4338 1200 3.1966 18.8874 39.1014 87.5438 9.6294
10.5869 0.4699 1300 3.1740 19.6059 39.1199 87.5527 9.4188
10.2916 0.5061 1400 3.1164 19.5868 39.2855 87.526 9.4376
10.1707 0.5422 1500 3.1302 19.085 39.0883 86.2273 9.4918
9.8533 0.5783 1600 3.1082 20.11 39.5753 112.3991 10.3153
9.5411 0.6145 1700 3.1062 19.3325 38.7649 87.5866 9.5035
9.4625 0.6506 1800 3.1880 20.2229 39.9274 85.6174 9.6541
9.1187 0.6868 1900 3.1313 20.1971 40.0696 86.2361 9.6553
8.8392 0.7229 2000 3.1698 19.553 39.8981 86.4413 9.5694
8.7918 0.7591 2100 3.0803 19.0912 38.8958 87.1128 9.5129
8.5146 0.7952 2200 3.0822 20.3744 39.3395 85.0449 9.5706
8.2272 0.8314 2300 3.0339 19.777 39.2184 86.9402 9.5988
8.1697 0.8675 2400 3.0921 20.8023 40.7538 85.5226 10.2047
7.8999 0.9037 2500 3.0422 20.6033 40.249 85.6087 9.5729
7.7308 0.9398 2600 2.9828 20.6965 40.3011 85.6234 9.7353
7.6002 0.9760 2700 2.9992 19.4691 40.2133 85.8088 9.8
7.0018 1.0119 2800 3.0102 21.3728 40.6846 84.2833 10.7976
6.742 1.0481 2900 3.0044 21.8755 40.8221 82.0813 10.0094
6.6259 1.0842 3000 2.9708 22.208 41.8398 82.5152 10.0176
6.6459 1.1204 3100 3.0201 21.2223 41.0674 85.1103 9.5882
6.5113 1.1565 3200 2.9392 21.7497 41.5689 93.3569 10.2729
6.3487 1.1927 3300 2.9307 21.1469 40.5262 84.0993 9.6176
6.2901 1.2288 3400 2.9405 20.7624 40.0 87.6364 9.7929
6.1662 1.2650 3500 2.9363 21.4503 40.9326 82.4769 9.5341
6.0311 1.3011 3600 2.9149 22.0276 41.5518 86.9014 10.1224
6.0328 1.3372 3700 2.9179 21.9494 41.346 83.581 9.5918
5.9334 1.3734 3800 2.9188 21.1516 41.2172 84.8387 9.5647
5.8305 1.4095 3900 2.8524 21.7836 41.5055 87.0746 9.7153
5.6441 1.4457 4000 2.8770 21.9137 41.5065 84.0794 9.5812
5.7037 1.4818 4100 2.8587 21.5262 41.6439 84.0237 9.5918
5.695 1.5180 4200 2.8527 21.5869 41.4011 84.2932 9.9082
5.461 1.5541 4300 2.8279 21.7846 41.9322 82.9172 10.3282
5.4613 1.5903 4400 2.8400 22.0272 41.7772 83.4485 9.7412
5.3932 1.6264 4500 2.8329 22.0863 41.9314 82.8744 9.5365
5.3149 1.6626 4600 2.8271 21.9729 41.9086 83.3855 9.5824
5.4402 1.6987 4700 2.8141 22.6734 42.5347 81.9532 9.5741
5.2943 1.7349 4800 2.8076 22.2604 42.0449 83.2422 9.5294
5.3244 1.7710 4900 2.8045 22.3166 42.1774 83.1354 9.7812
5.26 1.8072 5000 2.8099 22.2745 42.1042 83.958 9.8424
5.15 1.8433 5100 2.7981 22.2965 42.1157 82.8677 9.7471
5.2851 1.8795 5200 2.7977 22.6805 42.4116 84.6005 9.96
5.1283 1.9156 5300 2.7976 22.7956 42.5387 84.5206 9.9835
5.1913 1.9517 5400 2.7964 22.4717 42.3181 83.1369 9.5718
5.1884 1.9879 5500 2.7957 22.6721 42.5712 83.0017 9.5671

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.21.0
Downloads last month
6
Safetensors
Model size
239M params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for BounharAbdelaziz/Terjman-Large-v2.2-bs-16-lr-0.001-ep-2-wp-0.1-gacc-8-gnm-1.0-mx-512-v2.2

Finetuned
(2)
this model