Terjman-Large-v2.2-bs-16-lr-0.001-ep-2-wp-0.1-gacc-8-gnm-1.0-mx-512-v2.2
This model is a fine-tuned version of atlasia/Terjman-Large-v1.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.7957
- Bleu: 22.6721
- Chrf: 42.5712
- Ter: 83.0017
- Gen Len: 9.5671
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Chrf | Ter | Gen Len |
---|---|---|---|---|---|---|---|
17.8653 | 0.0361 | 100 | 3.1820 | 18.704 | 39.8197 | 86.1582 | 9.5329 |
14.0786 | 0.0723 | 200 | 2.8809 | 19.9939 | 41.1843 | 83.2822 | 10.6624 |
13.5627 | 0.1084 | 300 | 2.8958 | 20.3899 | 41.2135 | 83.4647 | 14.3576 |
13.2604 | 0.1446 | 400 | 2.9575 | 20.0224 | 40.5769 | 83.3912 | 11.0141 |
13.4752 | 0.1807 | 500 | 3.0164 | 21.1674 | 40.736 | 83.1323 | 9.4341 |
13.7315 | 0.2169 | 600 | 3.1340 | 19.0237 | 38.3741 | 85.8849 | 9.3506 |
13.2278 | 0.2530 | 700 | 3.1809 | 18.6908 | 38.2692 | 87.7134 | 9.4847 |
12.8315 | 0.2892 | 800 | 3.1363 | 19.2517 | 38.9324 | 86.5747 | 9.6235 |
12.2487 | 0.3253 | 900 | 3.1947 | 19.0902 | 39.3336 | 86.9771 | 9.4212 |
11.7382 | 0.3615 | 1000 | 3.1702 | 19.3707 | 38.5027 | 85.2427 | 9.66 |
11.2323 | 0.3976 | 1100 | 3.3132 | 18.521 | 37.7618 | 87.9382 | 11.7424 |
10.9276 | 0.4338 | 1200 | 3.1966 | 18.8874 | 39.1014 | 87.5438 | 9.6294 |
10.5869 | 0.4699 | 1300 | 3.1740 | 19.6059 | 39.1199 | 87.5527 | 9.4188 |
10.2916 | 0.5061 | 1400 | 3.1164 | 19.5868 | 39.2855 | 87.526 | 9.4376 |
10.1707 | 0.5422 | 1500 | 3.1302 | 19.085 | 39.0883 | 86.2273 | 9.4918 |
9.8533 | 0.5783 | 1600 | 3.1082 | 20.11 | 39.5753 | 112.3991 | 10.3153 |
9.5411 | 0.6145 | 1700 | 3.1062 | 19.3325 | 38.7649 | 87.5866 | 9.5035 |
9.4625 | 0.6506 | 1800 | 3.1880 | 20.2229 | 39.9274 | 85.6174 | 9.6541 |
9.1187 | 0.6868 | 1900 | 3.1313 | 20.1971 | 40.0696 | 86.2361 | 9.6553 |
8.8392 | 0.7229 | 2000 | 3.1698 | 19.553 | 39.8981 | 86.4413 | 9.5694 |
8.7918 | 0.7591 | 2100 | 3.0803 | 19.0912 | 38.8958 | 87.1128 | 9.5129 |
8.5146 | 0.7952 | 2200 | 3.0822 | 20.3744 | 39.3395 | 85.0449 | 9.5706 |
8.2272 | 0.8314 | 2300 | 3.0339 | 19.777 | 39.2184 | 86.9402 | 9.5988 |
8.1697 | 0.8675 | 2400 | 3.0921 | 20.8023 | 40.7538 | 85.5226 | 10.2047 |
7.8999 | 0.9037 | 2500 | 3.0422 | 20.6033 | 40.249 | 85.6087 | 9.5729 |
7.7308 | 0.9398 | 2600 | 2.9828 | 20.6965 | 40.3011 | 85.6234 | 9.7353 |
7.6002 | 0.9760 | 2700 | 2.9992 | 19.4691 | 40.2133 | 85.8088 | 9.8 |
7.0018 | 1.0119 | 2800 | 3.0102 | 21.3728 | 40.6846 | 84.2833 | 10.7976 |
6.742 | 1.0481 | 2900 | 3.0044 | 21.8755 | 40.8221 | 82.0813 | 10.0094 |
6.6259 | 1.0842 | 3000 | 2.9708 | 22.208 | 41.8398 | 82.5152 | 10.0176 |
6.6459 | 1.1204 | 3100 | 3.0201 | 21.2223 | 41.0674 | 85.1103 | 9.5882 |
6.5113 | 1.1565 | 3200 | 2.9392 | 21.7497 | 41.5689 | 93.3569 | 10.2729 |
6.3487 | 1.1927 | 3300 | 2.9307 | 21.1469 | 40.5262 | 84.0993 | 9.6176 |
6.2901 | 1.2288 | 3400 | 2.9405 | 20.7624 | 40.0 | 87.6364 | 9.7929 |
6.1662 | 1.2650 | 3500 | 2.9363 | 21.4503 | 40.9326 | 82.4769 | 9.5341 |
6.0311 | 1.3011 | 3600 | 2.9149 | 22.0276 | 41.5518 | 86.9014 | 10.1224 |
6.0328 | 1.3372 | 3700 | 2.9179 | 21.9494 | 41.346 | 83.581 | 9.5918 |
5.9334 | 1.3734 | 3800 | 2.9188 | 21.1516 | 41.2172 | 84.8387 | 9.5647 |
5.8305 | 1.4095 | 3900 | 2.8524 | 21.7836 | 41.5055 | 87.0746 | 9.7153 |
5.6441 | 1.4457 | 4000 | 2.8770 | 21.9137 | 41.5065 | 84.0794 | 9.5812 |
5.7037 | 1.4818 | 4100 | 2.8587 | 21.5262 | 41.6439 | 84.0237 | 9.5918 |
5.695 | 1.5180 | 4200 | 2.8527 | 21.5869 | 41.4011 | 84.2932 | 9.9082 |
5.461 | 1.5541 | 4300 | 2.8279 | 21.7846 | 41.9322 | 82.9172 | 10.3282 |
5.4613 | 1.5903 | 4400 | 2.8400 | 22.0272 | 41.7772 | 83.4485 | 9.7412 |
5.3932 | 1.6264 | 4500 | 2.8329 | 22.0863 | 41.9314 | 82.8744 | 9.5365 |
5.3149 | 1.6626 | 4600 | 2.8271 | 21.9729 | 41.9086 | 83.3855 | 9.5824 |
5.4402 | 1.6987 | 4700 | 2.8141 | 22.6734 | 42.5347 | 81.9532 | 9.5741 |
5.2943 | 1.7349 | 4800 | 2.8076 | 22.2604 | 42.0449 | 83.2422 | 9.5294 |
5.3244 | 1.7710 | 4900 | 2.8045 | 22.3166 | 42.1774 | 83.1354 | 9.7812 |
5.26 | 1.8072 | 5000 | 2.8099 | 22.2745 | 42.1042 | 83.958 | 9.8424 |
5.15 | 1.8433 | 5100 | 2.7981 | 22.2965 | 42.1157 | 82.8677 | 9.7471 |
5.2851 | 1.8795 | 5200 | 2.7977 | 22.6805 | 42.4116 | 84.6005 | 9.96 |
5.1283 | 1.9156 | 5300 | 2.7976 | 22.7956 | 42.5387 | 84.5206 | 9.9835 |
5.1913 | 1.9517 | 5400 | 2.7964 | 22.4717 | 42.3181 | 83.1369 | 9.5718 |
5.1884 | 1.9879 | 5500 | 2.7957 | 22.6721 | 42.5712 | 83.0017 | 9.5671 |
Framework versions
- Transformers 4.47.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.21.0
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for BounharAbdelaziz/Terjman-Large-v2.2-bs-16-lr-0.001-ep-2-wp-0.1-gacc-8-gnm-1.0-mx-512-v2.2
Base model
Helsinki-NLP/opus-mt-tc-big-en-ar
Finetuned
atlasia/Terjman-Large-v1.2