See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: EleutherAI/pythia-14m
bf16: auto
chat_template: llama3
dataloader_num_workers: 6
dataset_prepared_path: null
datasets:
- data_files:
  - 802c9640ea62abdd_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/802c9640ea62abdd_train_data.json
  type:
    field_instruction: instruction
    field_output: output
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping:
  metric: eval_loss
  mode: min
  patience: 3
eval_max_new_tokens: 128
eval_steps: 200
eval_table_size: null
evals_per_epoch: null
flash_attention: true
fp16: true
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 8
gradient_checkpointing: false
group_by_length: true
hub_model_id: error577/69b7eea5-e651-4261-bbe7-dff9a84f0402
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0005
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 16
lora_dropout: 0.3
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps:
micro_batch_size: 8
mlflow_experiment_name: /tmp/802c9640ea62abdd_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 50
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 200
sequence_len: 512
special_tokens:
  pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.02
wandb_entity: null
wandb_mode: online
wandb_name: 8ea0398e-a0bd-403d-bf23-7a1713ec2e02
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 8ea0398e-a0bd-403d-bf23-7a1713ec2e02
warmup_steps: 100
weight_decay: 0.01
xformers_attention: null

69b7eea5-e651-4261-bbe7-dff9a84f0402

This model is a fine-tuned version of EleutherAI/pythia-14m on the None dataset. It achieves the following results on the evaluation set:

Loss: 3.8237

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss
39.1249	0.0026	1	10.0968
71.4111	0.5282	200	9.3248
44.3177	1.0565	400	8.1473
49.1185	1.5847	600	6.9981
50.9228	2.1129	800	6.6570
40.8153	2.6411	1000	6.8431
42.4526	3.1694	1200	7.7932
39.7586	3.6976	1400	6.2292
48.8855	4.2258	1600	5.4457
44.0622	4.7540	1800	4.9771
45.8481	5.2823	2000	6.3697
47.8421	5.8105	2200	5.8377
43.9395	6.3387	2400	5.1223
85.549	6.8670	2600	6.1504
47.3447	7.3952	2800	5.7667
43.8816	7.9234	3000	5.3430
40.3727	8.4516	3200	6.1097
41.1325	8.9799	3400	5.1159
40.9136	9.5081	3600	4.5801
41.1352	10.0363	3800	6.2688
39.6023	10.5645	4000	5.1909
80.3153	11.0928	4200	6.4605
45.0038	11.6210	4400	4.5793
41.6005	12.1492	4600	4.9905
45.3766	12.6775	4800	5.0839
47.2055	13.2057	5000	4.8364
39.6311	13.7339	5200	4.4988
38.6004	14.2621	5400	4.3441
40.9808	14.7904	5600	4.4398
37.3181	15.3186	5800	5.1047
36.6136	15.8468	6000	5.4884
37.8118	16.3750	6200	4.3552
36.0667	16.9033	6400	4.3316
37.5132	17.4315	6600	4.8862
50.9856	17.9597	6800	5.5664
36.4944	18.4879	7000	4.2171
45.6295	19.0162	7200	4.1826
42.4406	19.5444	7400	4.0599
26.7199	20.0726	7600	4.1067
54.9829	20.6009	7800	4.8624
39.5695	21.1291	8000	4.0729
37.4214	21.6573	8200	3.9378
36.9187	22.1855	8400	4.3148
35.4205	22.7138	8600	4.1648
35.7372	23.2420	8800	4.0165
35.1088	23.7702	9000	3.8706
39.2135	24.2984	9200	4.0032
37.4056	24.8267	9400	3.9010
34.5682	25.3549	9600	3.8448
36.7296	25.8831	9800	4.0820
42.8511	26.4114	10000	3.9215
40.7808	26.9396	10200	3.8902
15.1421	27.4678	10400	3.7762
34.2062	27.9960	10600	3.8415
36.2686	28.5243	10800	3.8351
36.9976	29.0525	11000	3.7793
35.2614	29.5807	11200	3.8270
34.4366	30.1089	11400	3.7929
34.2567	30.6372	11600	3.8164
40.2896	31.1654	11800	3.7706
37.7728	31.6936	12000	3.7791
35.5756	32.2219	12200	3.7962
34.3571	32.7501	12400	3.8736
42.4826	33.2783	12600	3.7960
40.7814	33.8065	12800	3.8017
14.1216	34.3348	13000	3.7857
11.2619	34.8630	13200	3.7641
37.2205	35.3912	13400	3.8345
35.3169	35.9194	13600	3.8081
34.3745	36.4477	13800	3.7031
34.5547	36.9759	14000	3.7369
34.5162	37.5041	14200	3.6995
37.7118	38.0324	14400	3.7207
39.0695	38.5606	14600	3.7335
34.4469	39.0888	14800	3.7482
34.4357	39.6170	15000	3.7970
43.1633	40.1453	15200	3.7945
42.9024	40.6735	15400	3.7269
14.7952	41.2017	15600	3.8083
10.8769	41.7299	15800	3.7764
35.1502	42.2582	16000	3.8323
35.9114	42.7864	16200	3.7109
34.8312	43.3146	16400	3.7495
34.5818	43.8429	16600	3.7187
33.9313	44.3711	16800	3.7662
34.665	44.8993	17000	3.8026
37.6367	45.4275	17200	3.6587
38.0785	45.9558	17400	3.6727
33.453	46.4840	17600	3.7711
40.0688	47.0122	17800	3.7068
40.0877	47.5404	18000	3.8389
10.4014	48.0687	18200	3.7921
10.9314	48.5969	18400	3.8024
33.8793	49.1251	18600	3.7171
34.2436	49.6534	18800	3.8237

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.0+cu124
Datasets 3.0.1
Tokenizers 0.20.1

error577
/

69b7eea5-e651-4261-bbe7-dff9a84f0402

69b7eea5-e651-4261-bbe7-dff9a84f0402

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for error577/69b7eea5-e651-4261-bbe7-dff9a84f0402

Evaluation results