File size: 3,533 Bytes
71e1e51 88d842f 71e1e51 88d842f 2c1629f b4fbdcf 2c1629f 7803b77 dd3d3d9 48057fa dd3d3d9 2c1629f 71e1e51 88d842f 71e1e51 938f021 71e1e51 95807e9 c5756c6 29b96c3 749a177 71e1e51 7803b77 71e1e51 26e6d8a 71e1e51 adc67cb bd718fd e14b4a3 71e1e51 7be5e9b 71e1e51 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
base_model: Qwen/Qwen2.5-1.5B-Instruct
library_name: transformers
model_name: Qwen2.5-1.5B-Thinking
tags:
- generated_from_trainer
- trl
- grpo
licence: license
datasets:
- microsoft/orca-math-word-problems-200k
model-index:
- name: Qwen2.5-1.5B-Thinking
results:
- task:
type: text-generation
dataset:
name: openai/gsm8k
type: GradeSchoolMath8K
metrics:
- name: GSM8k (0-Shot)
type: GSM8k (0-Shot)
value: 14.4%
- name: GSM8k (Few-Shot)
type: GSM8k (Few-Shot)
value: 63.31%
co2_eq_emissions:
emissions: 7100
source: "https://mlco2.github.io/impact#compute"
training_type: "GRPO"
geographical_location: "East US2"
hardware_used: "1 x H100 96GB"
---
# Model Card for Qwen2.5-1.5B-Thinking
Improved Model at [Qwen2.5-1.5B-Thinking-v1.1](https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking-v1.1).
It has been trained using [TRL](https://github.com/huggingface/trl).
## Evals
| Model | GSM8k 0-Shot | GSM8k Few-Shot |
|------------------------------------------|------------------|-------------------|
| Mistral-7B-v0.1 | 10 | 41 |
| Qwen2.5-1.5B-Thinking | 14.4 | 63.31 |
## Training procedure
<img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>
<img src="https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking/resolve/main/w%26b_qwen_r1.png" width="1200" height="900"/>
Trained on 1xH100 96GB via Azure Cloud (East US2).
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
### Usage Recommendations
**Recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:**
1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
2. **For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."**
3. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
4. This model is not enhanced for other domains apart from Maths.
### Framework versions
- TRL: 0.15.0.dev0
- Transformers: 4.49.0.dev0
- Pytorch: 2.5.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0
## Citations
Cite GRPO as:
```bibtex
@article{zhihong2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
}
```
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
``` |