Update README.md
Browse files
README.md
CHANGED
@@ -10,29 +10,23 @@ model-index:
|
|
10 |
license: llama3.3
|
11 |
---
|
12 |
|
13 |
-
![
|
14 |
|
15 |
|
16 |
# 70B-L3.3-mhnnn-x1
|
17 |
|
18 |
-
I
|
19 |
|
20 |
-
|
21 |
-
- LoRA Trained on ~1.1GB of literature and raw text over Qwen 2.5's base model.
|
22 |
-
- Cleaned text and literature as best as I could, still, may have had issues here and there.
|
23 |
-
|
24 |
-
Freya-S2
|
25 |
-
- The first LoRA was applied over Qwen 2.5 Instruct, then I trained on top of that.
|
26 |
-
- Reduced LoRA rank because it's mainly instruct and other details I won't get into.
|
27 |
|
28 |
Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
|
29 |
```
|
30 |
-
Prompt Format:
|
31 |
-
Temperature: 1
|
32 |
min_p: 0.05
|
33 |
```
|
34 |
|
35 |
-
Training time in total was ~
|
36 |
|
37 |
https://sao10k.carrd.co/ for contact.
|
38 |
|
@@ -47,7 +41,6 @@ adapter: lora # 16-bit
|
|
47 |
lora_r: 64
|
48 |
lora_alpha: 64
|
49 |
lora_dropout: 0.2
|
50 |
-
lora_fan_in_fan_out:
|
51 |
peft_use_rslora: true
|
52 |
lora_target_linear: true
|
53 |
|
@@ -62,7 +55,7 @@ datasets:
|
|
62 |
# S2 - Instruct
|
63 |
- path: datasets/10k-amoral-full-fixed-sys.json
|
64 |
type: chat_template
|
65 |
-
chat_template:
|
66 |
roles_to_train: ["gpt"]
|
67 |
field_messages: conversations
|
68 |
message_field_role: from
|
@@ -70,7 +63,7 @@ datasets:
|
|
70 |
train_on_eos: turn
|
71 |
- path: datasets/44k-hespera-smartshuffle.json
|
72 |
type: chat_template
|
73 |
-
chat_template:
|
74 |
roles_to_train: ["gpt"]
|
75 |
field_messages: conversations
|
76 |
message_field_role: from
|
@@ -78,7 +71,7 @@ datasets:
|
|
78 |
train_on_eos: turn
|
79 |
- path: datasets/5k_rpg_adventure_instruct-sys.json
|
80 |
type: chat_template
|
81 |
-
chat_template:
|
82 |
roles_to_train: ["gpt"]
|
83 |
field_messages: conversations
|
84 |
message_field_role: from
|
@@ -128,7 +121,7 @@ max_grad_norm: 10.0
|
|
128 |
gc_steps: 10
|
129 |
|
130 |
# Misc
|
131 |
-
deepspeed: ./deepspeed_configs/
|
132 |
```
|
133 |
|
134 |
</details><br>
|
|
|
10 |
license: llama3.3
|
11 |
---
|
12 |
|
13 |
+
![yeah](https://huggingface.co/Sao10K/70B-L3.3-Freya-x1/resolve/main/Yeah.jpg)
|
14 |
|
15 |
|
16 |
# 70B-L3.3-mhnnn-x1
|
17 |
|
18 |
+
I quite liked it, after messing around. Same data composition as Freya, applied differently.
|
19 |
|
20 |
+
Has occasional brainfarts which are fixed with a regen, the price for more creative outputs.
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
Recommended Model Settings | *Look, I just use these, they work fine enough. I don't even know how DRY or other meme samplers work. Your system prompt matters more anyway.*
|
23 |
```
|
24 |
+
Prompt Format: Llama-3-Instruct
|
25 |
+
Temperature: 1.1
|
26 |
min_p: 0.05
|
27 |
```
|
28 |
|
29 |
+
Training time in total was ~14 Hours on a 8xH100 Node, shout out to SCDF for not sponsoring this run. My funds are dry doing random things.
|
30 |
|
31 |
https://sao10k.carrd.co/ for contact.
|
32 |
|
|
|
41 |
lora_r: 64
|
42 |
lora_alpha: 64
|
43 |
lora_dropout: 0.2
|
|
|
44 |
peft_use_rslora: true
|
45 |
lora_target_linear: true
|
46 |
|
|
|
55 |
# S2 - Instruct
|
56 |
- path: datasets/10k-amoral-full-fixed-sys.json
|
57 |
type: chat_template
|
58 |
+
chat_template: llama3
|
59 |
roles_to_train: ["gpt"]
|
60 |
field_messages: conversations
|
61 |
message_field_role: from
|
|
|
63 |
train_on_eos: turn
|
64 |
- path: datasets/44k-hespera-smartshuffle.json
|
65 |
type: chat_template
|
66 |
+
chat_template: llama3
|
67 |
roles_to_train: ["gpt"]
|
68 |
field_messages: conversations
|
69 |
message_field_role: from
|
|
|
71 |
train_on_eos: turn
|
72 |
- path: datasets/5k_rpg_adventure_instruct-sys.json
|
73 |
type: chat_template
|
74 |
+
chat_template: llama3
|
75 |
roles_to_train: ["gpt"]
|
76 |
field_messages: conversations
|
77 |
message_field_role: from
|
|
|
121 |
gc_steps: 10
|
122 |
|
123 |
# Misc
|
124 |
+
deepspeed: ./deepspeed_configs/zero3_bf16.json
|
125 |
```
|
126 |
|
127 |
</details><br>
|