Update README.md
Browse files
README.md
CHANGED
@@ -39,16 +39,15 @@ This model is a fine-tuned version of [Qwen/Qwen2.5-1.5B-Instruct](https://huggi
|
|
39 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
40 |
|
41 |
|
42 |
-
|
43 |
|
44 |
-
|
45 |
-
|
|
|
|
|
|
|
|
|
46 |
|
47 |
-
question = "Mia can decorate 2 dozen Easter eggs per hour. Her little brother Billy can only decorate 10 eggs per hour. They need to decorate 170 eggs for the Easter egg hunt. If they work together, how long will it take them to decorate all the eggs?"
|
48 |
-
generator = pipeline("text-generation", model="justinj92/Qwen2.5-1.5B-Thinking", device="cuda")
|
49 |
-
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
|
50 |
-
print(output["generated_text"])
|
51 |
-
```
|
52 |
## Evals
|
53 |
|
54 |
| Model | GSM8k 0-Shot | GSM8k Few-Shot |
|
@@ -57,17 +56,13 @@ print(output["generated_text"])
|
|
57 |
| Qwen2.5-1.5B-Thinking | 14.4 | 63.31 |
|
58 |
|
59 |
|
60 |
-
|
61 |
-
|
62 |
## Training procedure
|
63 |
|
64 |
<img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>
|
65 |
|
66 |
<img src="https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking/resolve/main/w%26b_qwen_r1.png" width="1024" height="800"/>
|
67 |
|
68 |
-
Trained on 1xH100 96GB via Azure Cloud.
|
69 |
-
|
70 |
-
GRPO'd on Maths related problems due to GPU Credit constraints.
|
71 |
|
72 |
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
73 |
|
|
|
39 |
It has been trained using [TRL](https://github.com/huggingface/trl).
|
40 |
|
41 |
|
42 |
+
### Usage Recommendations
|
43 |
|
44 |
+
**We recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:**
|
45 |
+
|
46 |
+
1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
|
47 |
+
2. **For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."**
|
48 |
+
3. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
|
49 |
+
4. This model is not enhanced for other domains apart from Maths.
|
50 |
|
|
|
|
|
|
|
|
|
|
|
51 |
## Evals
|
52 |
|
53 |
| Model | GSM8k 0-Shot | GSM8k Few-Shot |
|
|
|
56 |
| Qwen2.5-1.5B-Thinking | 14.4 | 63.31 |
|
57 |
|
58 |
|
|
|
|
|
59 |
## Training procedure
|
60 |
|
61 |
<img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>
|
62 |
|
63 |
<img src="https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking/resolve/main/w%26b_qwen_r1.png" width="1024" height="800"/>
|
64 |
|
65 |
+
Trained on 1xH100 96GB via Azure Cloud (East US2).
|
|
|
|
|
66 |
|
67 |
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
68 |
|