End of training

Files changed (5) hide show

README.md CHANGED Viewed

@@ -47,7 +47,7 @@ eval_max_new_tokens: 128
 eval_steps: 50
 eval_table_size: null
 evals_per_epoch: null
-flash_attention: false
 fp16: false
 fsdp: null
 fsdp_config: null
@@ -115,7 +115,7 @@ xformers_attention: null
 This model is a fine-tuned version of [migtissera/Tess-v2.5-Phi-3-medium-128k-14B](https://huggingface.co/migtissera/Tess-v2.5-Phi-3-medium-128k-14B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.0860
 ## Model description
@@ -149,12 +149,12 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| No log        | 0.0002 | 1    | 0.8770          |
-| 0.3826        | 0.0114 | 50   | 0.1017          |
-| 0.275         | 0.0229 | 100  | 0.0823          |
-| 0.2988        | 0.0343 | 150  | 0.1188          |
-| 0.2608        | 0.0457 | 200  | 0.0651          |
-| 0.212         | 0.0572 | 250  | 0.0860          |
 ### Framework versions

 eval_steps: 50
 eval_table_size: null
 evals_per_epoch: null
+flash_attention: true
 fp16: false
 fsdp: null
 fsdp_config: null
 This model is a fine-tuned version of [migtissera/Tess-v2.5-Phi-3-medium-128k-14B](https://huggingface.co/migtissera/Tess-v2.5-Phi-3-medium-128k-14B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.0855
 ## Model description
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| No log        | 0.0002 | 1    | 0.8763          |
+| 0.3649        | 0.0114 | 50   | 0.1069          |
+| 0.3225        | 0.0229 | 100  | 0.0854          |
+| 0.2397        | 0.0343 | 150  | 0.0699          |
+| 0.2604        | 0.0457 | 200  | 0.0639          |
+| 0.1548        | 0.0572 | 250  | 0.0855          |
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -20,9 +20,9 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "qkv_proj",
-    "down_proj",
     "o_proj",
     "gate_up_proj"
   ],
   "task_type": "CAUSAL_LM",

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "o_proj",
+    "down_proj",
+    "qkv_proj",
     "gate_up_proj"
   ],
   "task_type": "CAUSAL_LM",

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:630869cadff03125917e33a5248a12e4989cd4f0ad37eccaa428f0b8035335d9
 size 445760970

 version https://git-lfs.github.com/spec/v1
+oid sha256:dd38428b4bd026e5819ea60f1163fd767c9da28ec19852e3b1857d4d963a2754
 size 445760970

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4d9bf86f014a05dc67164438701b9dbf80ba0c37e3b9f2060c6abef0ab7c1641
 size 445688440

 version https://git-lfs.github.com/spec/v1
+oid sha256:d9433966f7a757fab46a74e0770742134b969279fd289c9cef84a841b09e5f1b
 size 445688440

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fe8e44f29f7af913bf292c53b5d6728d1a582d30e6e936e7aded8c2fcec71489
 size 6776

 version https://git-lfs.github.com/spec/v1
+oid sha256:68d8d840fb8858e3bd7f466aded7dbd9ced6a79dd57555c86a327c72822c75c1
 size 6776