Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: EleutherAI/pythia-14m
bf16: auto
chat_template: llama3
dataloader_num_workers: 6
dataset_prepared_path: null
datasets:
- data_files:
  - 802c9640ea62abdd_train_data.json
  ds_type: json
  format: custom
  path: /workspace/input_data/802c9640ea62abdd_train_data.json
  type:
    field_instruction: instruction
    field_output: output
    format: '{instruction}'
    no_input_format: '{instruction}'
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping:
  metric: eval_loss
  mode: min
  patience: 3
eval_max_new_tokens: 128
eval_steps: 200
eval_table_size: null
evals_per_epoch: null
flash_attention: true
fp16: true
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 8
gradient_checkpointing: false
group_by_length: true
hub_model_id: error577/69b7eea5-e651-4261-bbe7-dff9a84f0402
hub_repo: null
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0005
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 16
lora_dropout: 0.3
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 8
lora_target_linear: true
lr_scheduler: cosine
max_grad_norm: 1.0
max_steps:
micro_batch_size: 8
mlflow_experiment_name: /tmp/802c9640ea62abdd_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 50
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 200
sequence_len: 512
special_tokens:
  pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.02
wandb_entity: null
wandb_mode: online
wandb_name: 8ea0398e-a0bd-403d-bf23-7a1713ec2e02
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: 8ea0398e-a0bd-403d-bf23-7a1713ec2e02
warmup_steps: 100
weight_decay: 0.01
xformers_attention: null

69b7eea5-e651-4261-bbe7-dff9a84f0402

This model is a fine-tuned version of EleutherAI/pythia-14m on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 3.8237

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss
39.1249 0.0026 1 10.0968
71.4111 0.5282 200 9.3248
44.3177 1.0565 400 8.1473
49.1185 1.5847 600 6.9981
50.9228 2.1129 800 6.6570
40.8153 2.6411 1000 6.8431
42.4526 3.1694 1200 7.7932
39.7586 3.6976 1400 6.2292
48.8855 4.2258 1600 5.4457
44.0622 4.7540 1800 4.9771
45.8481 5.2823 2000 6.3697
47.8421 5.8105 2200 5.8377
43.9395 6.3387 2400 5.1223
85.549 6.8670 2600 6.1504
47.3447 7.3952 2800 5.7667
43.8816 7.9234 3000 5.3430
40.3727 8.4516 3200 6.1097
41.1325 8.9799 3400 5.1159
40.9136 9.5081 3600 4.5801
41.1352 10.0363 3800 6.2688
39.6023 10.5645 4000 5.1909
80.3153 11.0928 4200 6.4605
45.0038 11.6210 4400 4.5793
41.6005 12.1492 4600 4.9905
45.3766 12.6775 4800 5.0839
47.2055 13.2057 5000 4.8364
39.6311 13.7339 5200 4.4988
38.6004 14.2621 5400 4.3441
40.9808 14.7904 5600 4.4398
37.3181 15.3186 5800 5.1047
36.6136 15.8468 6000 5.4884
37.8118 16.3750 6200 4.3552
36.0667 16.9033 6400 4.3316
37.5132 17.4315 6600 4.8862
50.9856 17.9597 6800 5.5664
36.4944 18.4879 7000 4.2171
45.6295 19.0162 7200 4.1826
42.4406 19.5444 7400 4.0599
26.7199 20.0726 7600 4.1067
54.9829 20.6009 7800 4.8624
39.5695 21.1291 8000 4.0729
37.4214 21.6573 8200 3.9378
36.9187 22.1855 8400 4.3148
35.4205 22.7138 8600 4.1648
35.7372 23.2420 8800 4.0165
35.1088 23.7702 9000 3.8706
39.2135 24.2984 9200 4.0032
37.4056 24.8267 9400 3.9010
34.5682 25.3549 9600 3.8448
36.7296 25.8831 9800 4.0820
42.8511 26.4114 10000 3.9215
40.7808 26.9396 10200 3.8902
15.1421 27.4678 10400 3.7762
34.2062 27.9960 10600 3.8415
36.2686 28.5243 10800 3.8351
36.9976 29.0525 11000 3.7793
35.2614 29.5807 11200 3.8270
34.4366 30.1089 11400 3.7929
34.2567 30.6372 11600 3.8164
40.2896 31.1654 11800 3.7706
37.7728 31.6936 12000 3.7791
35.5756 32.2219 12200 3.7962
34.3571 32.7501 12400 3.8736
42.4826 33.2783 12600 3.7960
40.7814 33.8065 12800 3.8017
14.1216 34.3348 13000 3.7857
11.2619 34.8630 13200 3.7641
37.2205 35.3912 13400 3.8345
35.3169 35.9194 13600 3.8081
34.3745 36.4477 13800 3.7031
34.5547 36.9759 14000 3.7369
34.5162 37.5041 14200 3.6995
37.7118 38.0324 14400 3.7207
39.0695 38.5606 14600 3.7335
34.4469 39.0888 14800 3.7482
34.4357 39.6170 15000 3.7970
43.1633 40.1453 15200 3.7945
42.9024 40.6735 15400 3.7269
14.7952 41.2017 15600 3.8083
10.8769 41.7299 15800 3.7764
35.1502 42.2582 16000 3.8323
35.9114 42.7864 16200 3.7109
34.8312 43.3146 16400 3.7495
34.5818 43.8429 16600 3.7187
33.9313 44.3711 16800 3.7662
34.665 44.8993 17000 3.8026
37.6367 45.4275 17200 3.6587
38.0785 45.9558 17400 3.6727
33.453 46.4840 17600 3.7711
40.0688 47.0122 17800 3.7068
40.0877 47.5404 18000 3.8389
10.4014 48.0687 18200 3.7921
10.9314 48.5969 18400 3.8024
33.8793 49.1251 18600 3.7171
34.2436 49.6534 18800 3.8237

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.0
  • Pytorch 2.5.0+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
19
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for error577/69b7eea5-e651-4261-bbe7-dff9a84f0402

Adapter
(214)
this model