error577 commited on
Commit
c7c00db
·
verified ·
1 Parent(s): c440cd8

End of training

Browse files
Files changed (2) hide show
  1. README.md +16 -10
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -46,7 +46,7 @@ flash_attention: false
46
  fp16: null
47
  fsdp: null
48
  fsdp_config: null
49
- gradient_accumulation_steps: 8
50
  gradient_checkpointing: false
51
  group_by_length: false
52
  hub_model_id: error577/58b9523a-8576-4309-80c7-060f2d6bf699
@@ -69,7 +69,7 @@ max_steps: 20
69
  micro_batch_size: 1
70
  mlflow_experiment_name: /tmp/45fb2d361254b178_train_data.json
71
  model_type: AutoModelForCausalLM
72
- num_epochs: 1
73
  optimizer: adamw_bnb_8bit
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
@@ -104,7 +104,7 @@ xformers_attention: null
104
 
105
  This model is a fine-tuned version of [NousResearch/CodeLlama-7b-hf](https://huggingface.co/NousResearch/CodeLlama-7b-hf) on the None dataset.
106
  It achieves the following results on the evaluation set:
107
- - Loss: 0.3416
108
 
109
  ## Model description
110
 
@@ -127,8 +127,8 @@ The following hyperparameters were used during training:
127
  - train_batch_size: 1
128
  - eval_batch_size: 1
129
  - seed: 42
130
- - gradient_accumulation_steps: 8
131
- - total_train_batch_size: 8
132
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
  - lr_scheduler_warmup_steps: 10
@@ -138,11 +138,17 @@ The following hyperparameters were used during training:
138
 
139
  | Training Loss | Epoch | Step | Validation Loss |
140
  |:-------------:|:------:|:----:|:---------------:|
141
- | 18.8184 | 0.0007 | 1 | 2.6565 |
142
- | 20.2215 | 0.0037 | 5 | 2.5815 |
143
- | 11.6753 | 0.0074 | 10 | 1.3255 |
144
- | 2.0407 | 0.0111 | 15 | 0.4048 |
145
- | 3.0463 | 0.0148 | 20 | 0.3416 |
 
 
 
 
 
 
146
 
147
 
148
  ### Framework versions
 
46
  fp16: null
47
  fsdp: null
48
  fsdp_config: null
49
+ gradient_accumulation_steps: 16
50
  gradient_checkpointing: false
51
  group_by_length: false
52
  hub_model_id: error577/58b9523a-8576-4309-80c7-060f2d6bf699
 
69
  micro_batch_size: 1
70
  mlflow_experiment_name: /tmp/45fb2d361254b178_train_data.json
71
  model_type: AutoModelForCausalLM
72
+ num_epochs: 4
73
  optimizer: adamw_bnb_8bit
74
  output_dir: miner_id_24
75
  pad_to_sequence_len: true
 
104
 
105
  This model is a fine-tuned version of [NousResearch/CodeLlama-7b-hf](https://huggingface.co/NousResearch/CodeLlama-7b-hf) on the None dataset.
106
  It achieves the following results on the evaluation set:
107
+ - Loss: 0.3307
108
 
109
  ## Model description
110
 
 
127
  - train_batch_size: 1
128
  - eval_batch_size: 1
129
  - seed: 42
130
+ - gradient_accumulation_steps: 16
131
+ - total_train_batch_size: 16
132
  - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
133
  - lr_scheduler_type: cosine
134
  - lr_scheduler_warmup_steps: 10
 
138
 
139
  | Training Loss | Epoch | Step | Validation Loss |
140
  |:-------------:|:------:|:----:|:---------------:|
141
+ | 43.4533 | 0.0015 | 1 | 2.6565 |
142
+ | 39.2663 | 0.0030 | 2 | 2.6556 |
143
+ | 39.5743 | 0.0059 | 4 | 2.6245 |
144
+ | 37.8648 | 0.0089 | 6 | 2.4578 |
145
+ | 36.6476 | 0.0119 | 8 | 1.9450 |
146
+ | 23.4278 | 0.0148 | 10 | 1.1998 |
147
+ | 11.9862 | 0.0178 | 12 | 0.5601 |
148
+ | 4.9234 | 0.0208 | 14 | 0.3981 |
149
+ | 9.4431 | 0.0237 | 16 | 0.3685 |
150
+ | 1.5801 | 0.0267 | 18 | 0.3404 |
151
+ | 5.968 | 0.0297 | 20 | 0.3307 |
152
 
153
 
154
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:413040f162c50bd4af4c0b57c3e1139fb02cccacb669ca62ba98ff3b19d1586f
3
  size 40138058
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eee251f7b2c48be0a3de26c754740786f52e89751f4f69fbf6dbf5752a97f0f4
3
  size 40138058