error577
/

58b9523a-8576-4309-80c7-060f2d6bf699

@@ -46,7 +46,7 @@ flash_attention: false
 fp16: null
 fsdp: null
 fsdp_config: null
-gradient_accumulation_steps: 8
 gradient_checkpointing: false
 group_by_length: false
 hub_model_id: error577/58b9523a-8576-4309-80c7-060f2d6bf699
@@ -69,7 +69,7 @@ max_steps: 20
 micro_batch_size: 1
 mlflow_experiment_name: /tmp/45fb2d361254b178_train_data.json
 model_type: AutoModelForCausalLM
-num_epochs: 1
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
@@ -104,7 +104,7 @@ xformers_attention: null
 This model is a fine-tuned version of [NousResearch/CodeLlama-7b-hf](https://huggingface.co/NousResearch/CodeLlama-7b-hf) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.3416
 ## Model description
@@ -127,8 +127,8 @@ The following hyperparameters were used during training:
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 8
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
@@ -138,11 +138,17 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 18.8184       | 0.0007 | 1    | 2.6565          |
-| 20.2215       | 0.0037 | 5    | 2.5815          |
-| 11.6753       | 0.0074 | 10   | 1.3255          |
-| 2.0407        | 0.0111 | 15   | 0.4048          |
-| 3.0463        | 0.0148 | 20   | 0.3416          |
 ### Framework versions

 fp16: null
 fsdp: null
 fsdp_config: null
+gradient_accumulation_steps: 16
 gradient_checkpointing: false
 group_by_length: false
 hub_model_id: error577/58b9523a-8576-4309-80c7-060f2d6bf699
 micro_batch_size: 1
 mlflow_experiment_name: /tmp/45fb2d361254b178_train_data.json
 model_type: AutoModelForCausalLM
+num_epochs: 4
 optimizer: adamw_bnb_8bit
 output_dir: miner_id_24
 pad_to_sequence_len: true
 This model is a fine-tuned version of [NousResearch/CodeLlama-7b-hf](https://huggingface.co/NousResearch/CodeLlama-7b-hf) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.3307
 ## Model description
 - train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 16
 - optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 43.4533       | 0.0015 | 1    | 2.6565          |
+| 39.2663       | 0.0030 | 2    | 2.6556          |
+| 39.5743       | 0.0059 | 4    | 2.6245          |
+| 37.8648       | 0.0089 | 6    | 2.4578          |
+| 36.6476       | 0.0119 | 8    | 1.9450          |
+| 23.4278       | 0.0148 | 10   | 1.1998          |
+| 11.9862       | 0.0178 | 12   | 0.5601          |
+| 4.9234        | 0.0208 | 14   | 0.3981          |
+| 9.4431        | 0.0237 | 16   | 0.3685          |
+| 1.5801        | 0.0267 | 18   | 0.3404          |
+| 5.968         | 0.0297 | 20   | 0.3307          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:413040f162c50bd4af4c0b57c3e1139fb02cccacb669ca62ba98ff3b19d1586f
 size 40138058

 version https://git-lfs.github.com/spec/v1
+oid sha256:eee251f7b2c48be0a3de26c754740786f52e89751f4f69fbf6dbf5752a97f0f4
 size 40138058