File size: 3,533 Bytes

71e1e51
 
 
88d842f
71e1e51
 
 
 
 
88d842f
 
2c1629f
 
 
 
 
 
b4fbdcf
 
2c1629f
 
 
 
7803b77
 
 
dd3d3d9
48057fa
dd3d3d9
 
 
 
2c1629f
71e1e51
 
88d842f
71e1e51
938f021
71e1e51
 
95807e9
c5756c6
 
29b96c3
 
 
749a177
71e1e51
7803b77
71e1e51
 
26e6d8a
71e1e51
adc67cb
bd718fd
e14b4a3
71e1e51
 
 
7be5e9b
 
 
 
 
 
 
 
 
71e1e51

---
base_model: Qwen/Qwen2.5-1.5B-Instruct
library_name: transformers
model_name: Qwen2.5-1.5B-Thinking
tags:
- generated_from_trainer
- trl
- grpo
licence: license
datasets:
- microsoft/orca-math-word-problems-200k
model-index:
  - name: Qwen2.5-1.5B-Thinking
    results:
      - task:
          type: text-generation
        dataset:
          name: openai/gsm8k
          type: GradeSchoolMath8K
        metrics:
          - name: GSM8k (0-Shot)
            type: GSM8k (0-Shot)
            value: 14.4%
          - name: GSM8k (Few-Shot)
            type: GSM8k (Few-Shot)
            value: 63.31%
co2_eq_emissions:
  emissions: 7100
  source: "https://mlco2.github.io/impact#compute"
  training_type: "GRPO"
  geographical_location: "East US2"
  hardware_used: "1 x H100 96GB"

---

# Model Card for Qwen2.5-1.5B-Thinking

Improved Model at  [Qwen2.5-1.5B-Thinking-v1.1](https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking-v1.1).
It has been trained using [TRL](https://github.com/huggingface/trl).


## Evals

| Model                                    | GSM8k 0-Shot | GSM8k Few-Shot |
|------------------------------------------|------------------|-------------------|
| Mistral-7B-v0.1                          | 10             | 41              |
| Qwen2.5-1.5B-Thinking             | 14.4             | 63.31                 |


## Training procedure

<img src="https://raw.githubusercontent.com/wandb/wandb/fc186783c86c33980e5c73f13363c13b2c5508b1/assets/logo-dark.svg" alt="Weights & Biases Logged" width="150" height="24"/>

<img src="https://huggingface.co/justinj92/Qwen2.5-1.5B-Thinking/resolve/main/w%26b_qwen_r1.png" width="1200" height="900"/>

Trained on 1xH100 96GB via Azure Cloud (East US2).

This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

### Usage Recommendations

**Recommend adhering to the following configurations when utilizing the models, including benchmarking, to achieve the expected performance:**

1. Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
2. **For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."**
3. When evaluating model performance, it is recommended to conduct multiple tests and average the results.
4. This model is not enhanced for other domains apart from Maths.

### Framework versions

- TRL: 0.15.0.dev0
- Transformers: 4.49.0.dev0
- Pytorch: 2.5.1
- Datasets: 3.2.0
- Tokenizers: 0.21.0

## Citations

Cite GRPO as:

```bibtex
@article{zhihong2024deepseekmath,
    title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
    author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
    year         = 2024,
    eprint       = {arXiv:2402.03300},
}

```

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```