Sao10K
/

70B-L3.3-mhnnn-x1

@@ -10,29 +10,23 @@ model-index:
 license: llama3.3
 ---
-![Freya](https://huggingface.co/Sao10K/14B-Qwen2.5-Freya-x1/resolve/main/sad.png)
 # 70B-L3.3-mhnnn-x1
-I decided to mess around with training methods again, considering the re-emegence of  methods like multi-step training. Some people began doing it again, and so, why not? Inspired by AshhLimaRP's methology but done it my way.
-Freya-S1
-- LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
-- Cleaned text and literature as best as I could, still, may have had issues here and there.
-Freya-S2
-- The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
-- Reduced LoRA rank because it's mainly instruct and other details I won't get into.
 Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
 ```
-Prompt Format: ChatML
-Temperature: 1+ # I don't know, man.
 min_p: 0.05
 ```
-Training time in total was ~10 Hours on a 8xH100 Node, sponsored by the Government of Singapore or something. Thanks for the national service allowance, MHA.
 https://sao10k.carrd.co/ for contact.
@@ -47,7 +41,6 @@ adapter: lora # 16-bit
 lora_r: 64
 lora_alpha: 64
 lora_dropout: 0.2
-lora_fan_in_fan_out:
 peft_use_rslora: true
 lora_target_linear: true
@@ -62,7 +55,7 @@ datasets:
 # S2 - Instruct
   - path: datasets/10k-amoral-full-fixed-sys.json
     type: chat_template
-    chat_template: chatml
     roles_to_train: ["gpt"]
     field_messages: conversations
     message_field_role: from
@@ -70,7 +63,7 @@ datasets:
     train_on_eos: turn
   - path: datasets/44k-hespera-smartshuffle.json
     type: chat_template
-    chat_template: chatml
     roles_to_train: ["gpt"]
     field_messages: conversations
     message_field_role: from
@@ -78,7 +71,7 @@ datasets:
     train_on_eos: turn
   - path: datasets/5k_rpg_adventure_instruct-sys.json
     type: chat_template
-    chat_template: chatml
     roles_to_train: ["gpt"]
     field_messages: conversations
     message_field_role: from
@@ -128,7 +121,7 @@ max_grad_norm: 10.0
 gc_steps: 10
 # Misc
-deepspeed: ./deepspeed_configs/zero2.json
 ```
 </details><br>

 license: llama3.3
 ---
+![yeah](https://huggingface.co/Sao10K/70B-L3.3-Freya-x1/resolve/main/Yeah.jpg)
 # 70B-L3.3-mhnnn-x1
+I quite liked it, after messing around. Same data composition as Freya, applied differently.
+Has occasional brainfarts which are fixed with a regen, the price for more creative outputs.
 Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
 ```
+Prompt Format: Llama-3-Instruct
+Temperature: 1.1
 min_p: 0.05
 ```
+Training time in total was ~14 Hours on a 8xH100 Node, shout out to SCDF for not sponsoring this run. My funds are dry doing random things.
 https://sao10k.carrd.co/ for contact.
 lora_r: 64
 lora_alpha: 64
 lora_dropout: 0.2
 peft_use_rslora: true
 lora_target_linear: true
 # S2 - Instruct
   - path: datasets/10k-amoral-full-fixed-sys.json
     type: chat_template
+    chat_template: llama3
     roles_to_train: ["gpt"]
     field_messages: conversations
     message_field_role: from
     train_on_eos: turn
   - path: datasets/44k-hespera-smartshuffle.json
     type: chat_template
+    chat_template: llama3
     roles_to_train: ["gpt"]
     field_messages: conversations
     message_field_role: from
     train_on_eos: turn
   - path: datasets/5k_rpg_adventure_instruct-sys.json
     type: chat_template
+    chat_template: llama3
     roles_to_train: ["gpt"]
     field_messages: conversations
     message_field_role: from
 gc_steps: 10
 # Misc
+deepspeed: ./deepspeed_configs/zero3_bf16.json
 ```
 </details><br>