Ministral-8B-Instruct-2410-PsyCourse-fold3

This model is a fine-tuned version of mistralai/Ministral-8B-Instruct-2410 on the course-train-fold1 dataset. It achieves the following results on the evaluation set:

Loss: 0.0309

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.2581	0.0770	50	0.2414
0.0852	0.1539	100	0.0696
0.0612	0.2309	150	0.0584
0.0579	0.3078	200	0.0537
0.0436	0.3848	250	0.0433
0.0395	0.4617	300	0.0470
0.0436	0.5387	350	0.0454
0.0487	0.6156	400	0.0436
0.0302	0.6926	450	0.0377
0.0301	0.7695	500	0.0377
0.0422	0.8465	550	0.0353
0.0352	0.9234	600	0.0341
0.0327	1.0004	650	0.0346
0.0328	1.0773	700	0.0361
0.0278	1.1543	750	0.0347
0.0277	1.2312	800	0.0336
0.0278	1.3082	850	0.0347
0.0208	1.3851	900	0.0341
0.037	1.4621	950	0.0345
0.0335	1.5391	1000	0.0357
0.0305	1.6160	1050	0.0322
0.0337	1.6930	1100	0.0377
0.0221	1.7699	1150	0.0325
0.0192	1.8469	1200	0.0378
0.0282	1.9238	1250	0.0325
0.0216	2.0008	1300	0.0309
0.0172	2.0777	1350	0.0312
0.0238	2.1547	1400	0.0342
0.0118	2.2316	1450	0.0379
0.02	2.3086	1500	0.0349
0.0162	2.3855	1550	0.0389
0.0138	2.4625	1600	0.0367
0.0193	2.5394	1650	0.0348
0.0208	2.6164	1700	0.0356
0.0228	2.6933	1750	0.0326
0.0195	2.7703	1800	0.0323
0.0219	2.8472	1850	0.0317
0.0169	2.9242	1900	0.0329
0.0235	3.0012	1950	0.0340
0.0092	3.0781	2000	0.0377
0.0107	3.1551	2050	0.0413
0.0093	3.2320	2100	0.0398
0.0076	3.3090	2150	0.0406
0.0115	3.3859	2200	0.0380
0.0065	3.4629	2250	0.0371
0.0115	3.5398	2300	0.0394
0.006	3.6168	2350	0.0399
0.0119	3.6937	2400	0.0366
0.0068	3.7707	2450	0.0387
0.0079	3.8476	2500	0.0394
0.0092	3.9246	2550	0.0405
0.0088	4.0015	2600	0.0393
0.0017	4.0785	2650	0.0415
0.0076	4.1554	2700	0.0446
0.0017	4.2324	2750	0.0453
0.0027	4.3093	2800	0.0469
0.003	4.3863	2850	0.0485
0.0047	4.4633	2900	0.0493
0.0021	4.5402	2950	0.0484
0.0031	4.6172	3000	0.0485
0.0036	4.6941	3050	0.0488
0.0028	4.7711	3100	0.0488
0.0031	4.8480	3150	0.0487
0.0035	4.9250	3200	0.0487

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

chchen
/

Ministral-8B-Instruct-2410-PsyCourse-fold3

Ministral-8B-Instruct-2410-PsyCourse-fold3

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for chchen/Ministral-8B-Instruct-2410-PsyCourse-fold3

Evaluation results