mt5-small-finetuned-Drishtants-summaries
This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.8276
- Rouge1: 0.3953
- Rouge2: 0.2206
- Rougel: 0.3789
- Rougelsum: 0.3822
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e-05
- train_batch_size: 10
- eval_batch_size: 10
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 40
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
24.1138 | 1.0 | 13 | 15.3479 | 0.0044 | 0.0 | 0.0043 | 0.0044 |
19.7323 | 2.0 | 26 | 13.7879 | 0.0044 | 0.0 | 0.0043 | 0.0044 |
18.329 | 3.0 | 39 | 11.7699 | 0.0042 | 0.0 | 0.0039 | 0.0042 |
15.8092 | 4.0 | 52 | 12.9758 | 0.0067 | 0.0 | 0.0064 | 0.0067 |
13.8072 | 5.0 | 65 | 8.1803 | 0.0048 | 0.0 | 0.0048 | 0.0048 |
11.9323 | 6.0 | 78 | 6.4151 | 0.0048 | 0.0 | 0.0048 | 0.0048 |
10.8486 | 7.0 | 91 | 5.3122 | 0.0067 | 0.0 | 0.0067 | 0.0067 |
10.2067 | 8.0 | 104 | 5.1497 | 0.0098 | 0.0 | 0.0097 | 0.0096 |
9.4972 | 9.0 | 117 | 4.9039 | 0.0136 | 0.0 | 0.0135 | 0.0132 |
8.4609 | 10.0 | 130 | 3.9617 | 0.0272 | 0.0013 | 0.0273 | 0.0269 |
7.2721 | 11.0 | 143 | 3.4252 | 0.0526 | 0.0093 | 0.0522 | 0.0492 |
5.943 | 12.0 | 156 | 3.1756 | 0.0746 | 0.0170 | 0.0640 | 0.0658 |
5.5122 | 13.0 | 169 | 2.9797 | 0.0649 | 0.0121 | 0.0610 | 0.0573 |
5.1628 | 14.0 | 182 | 2.8133 | 0.0818 | 0.0215 | 0.0738 | 0.0733 |
4.9023 | 15.0 | 195 | 2.6725 | 0.0798 | 0.0262 | 0.0767 | 0.0765 |
4.4493 | 16.0 | 208 | 2.5408 | 0.0924 | 0.0348 | 0.0881 | 0.0891 |
4.3145 | 17.0 | 221 | 2.4332 | 0.0914 | 0.0361 | 0.0796 | 0.0800 |
3.978 | 18.0 | 234 | 2.3434 | 0.0952 | 0.0422 | 0.0835 | 0.0843 |
3.9377 | 19.0 | 247 | 2.2749 | 0.1289 | 0.0617 | 0.1138 | 0.1137 |
3.6415 | 20.0 | 260 | 2.2123 | 0.1701 | 0.0698 | 0.1471 | 0.1451 |
3.4801 | 21.0 | 273 | 2.1490 | 0.1682 | 0.0758 | 0.1497 | 0.1480 |
3.5114 | 22.0 | 286 | 2.0997 | 0.1885 | 0.0858 | 0.1658 | 0.1662 |
3.3784 | 23.0 | 299 | 2.0567 | 0.1971 | 0.0931 | 0.1730 | 0.1729 |
3.2501 | 24.0 | 312 | 2.0291 | 0.1969 | 0.0952 | 0.1752 | 0.1753 |
3.208 | 25.0 | 325 | 2.0057 | 0.1959 | 0.0883 | 0.1746 | 0.1753 |
3.0992 | 26.0 | 338 | 1.9769 | 0.1984 | 0.0961 | 0.1759 | 0.1762 |
2.9069 | 27.0 | 351 | 1.9474 | 0.1938 | 0.0975 | 0.1734 | 0.1734 |
3.0772 | 28.0 | 364 | 1.9259 | 0.1897 | 0.0978 | 0.1714 | 0.1710 |
2.8778 | 29.0 | 377 | 1.9098 | 0.1766 | 0.0934 | 0.1584 | 0.1582 |
2.8723 | 30.0 | 390 | 1.8937 | 0.1752 | 0.0860 | 0.1551 | 0.1551 |
2.8102 | 31.0 | 403 | 1.8786 | 0.1808 | 0.0889 | 0.1610 | 0.1603 |
2.8453 | 32.0 | 416 | 1.8660 | 0.1971 | 0.0919 | 0.1745 | 0.1752 |
2.925 | 33.0 | 429 | 1.8544 | 0.2724 | 0.1441 | 0.2562 | 0.2564 |
2.8222 | 34.0 | 442 | 1.8468 | 0.3749 | 0.2099 | 0.3583 | 0.3592 |
2.7711 | 35.0 | 455 | 1.8414 | 0.3950 | 0.2216 | 0.3742 | 0.3785 |
2.8176 | 36.0 | 468 | 1.8367 | 0.3953 | 0.2206 | 0.3789 | 0.3822 |
2.7044 | 37.0 | 481 | 1.8321 | 0.3947 | 0.2201 | 0.3781 | 0.3817 |
2.7696 | 38.0 | 494 | 1.8295 | 0.3953 | 0.2206 | 0.3789 | 0.3822 |
2.6015 | 39.0 | 507 | 1.8281 | 0.3953 | 0.2206 | 0.3789 | 0.3822 |
2.6849 | 40.0 | 520 | 1.8276 | 0.3953 | 0.2206 | 0.3789 | 0.3822 |
Framework versions
- Transformers 4.47.1
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 25
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for ak2603/mt5-small-finetuned-Drishtants-summaries
Base model
google/mt5-small