---
language:
  - en
library_name: transformers
tags:
  - llm
  - llama
  - llama-3
  - grpo
  - reinforcement-learning
  - mathematical-reasoning
  - logic
  - deepseek
license: apache-2.0
datasets:
  - openai/gsm8k
model_name: LLaMA-3.2-3B-GRPO-GSM325
base_model: meta-llama/Meta-Llama-3-3B
pipeline_tag: text-generation
---

# **LLaMA-3.2-3B-GRPO-GSM325**  

🚀 **LLaMA-3.2-3B-GRPO-GSM325** is a fine-tuned version of **LLaMA 3.2B**, trained using **GRPO (Guided Reinforcement Policy Optimization)** and **DeepSeek R1’s open-source recipe**. This model significantly enhances the base **LLaMA-3.2-3B** in **mathematical problem-solving, logical reasoning, and structured response generation**, pushing it towards **GPT-4o1-style advanced reasoning**.  

🔥 Trained **entirely on a Free Google Colab Tesla T4 GPU**: [Training Notebook](https://colab.research.google.com/drive/1o95CT5DV2zZXjScDHxKfRJBaNGv3ULpj?usp=sharing)  

🚀 **With more resources and extended training, this model could be pushed even further!**  

## **Model Details**  
- **Base Model**: LLaMA 3.2B  
- **Fine-tuning Method**: GRPO with structured reinforcement  
- **Dataset**: 325 curated questions from GSM8K (math reasoning)  
- **Format Adherence**: XML-based structured reasoning  
- **Notable Improvements**:  
  - **Mathematical accuracy** ✔  
  - **Logical consistency** ✔  
  - **Structured XML-format responses** ✔  
  - **GPT-4o1-like step-by-step reasoning** ✔  

---

## **Usage**  

### **Example Input & Output**  

#### **Input (User Query)**  
```xml
If 2x+5=10. Solve for x.
```

#### **Output (Model Response)**  
```xml
<reasoning>
To solve for x, we need to isolate x on one side of the equation. This can be done by subtracting 5 from both sides of the equation.
</reasoning>
<answer>
2x + 5 - 5 = 10 - 5,
2x = 5,
2x / 2 = 5 / 2,
x = 2.5
</answer>
```

---

## **Installation & Inference**  

### **Hugging Face Transformers**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Rauhan/llama-3.2-3B-GRPO-GSM325"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```

### **Using vLLM for Fast Inference**  
```python
from vllm import LLM, SamplingParams

llm = LLM(model="Rauhan/llama-3.2-3B-GRPO-GSM325")
sampling_params = SamplingParams(temperature=0.7, max_tokens=256)

output = llm.generate(["<reasoning>\nA store sells apples...\n</reasoning>"], sampling_params)
print(output)
```

## **Limitations & Future Work**  

🚧 **Limitations**:  
- Limited by **small dataset size (325 questions)**
- Training done on **a single Free Google Colab Tesla T4 GPU**  
- Some **long-form reasoning may need further fine-tuning**  

🚀 **Future Improvements**:  
- Training on **a larger dataset** (more GSM8K questions + other logical reasoning datasets)  
- Extending fine-tuning using **DeepSeek R1’s full training pipeline**  
- Further **quantization for faster and memory-efficient inference**  

---

## **License & Citation**  

This model is released under **Apache 2.0 License**. If you use this model in your research, please cite:  

```
@misc{llama-3.2-3B-GRPO-GSM325,
  author = {Rauhan},
  title = {LLaMA-3.2-3B-GRPO-GSM325},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Rauhan/llama-3.2-3B-GRPO-GSM325}
}
```

---

🚀 **This model demonstrates how even small models can achieve great results with the right fine-tuning techniques!** 🚀  

---  

## **About the Author**  

🔗 **Portfolio & Contact Information**:  
- 🌍 Website: [rauhanahmed.org](https://rauhanahmed.org)  
- 🏢 GitHub: [github.com/rauhanAhmed](https://github.com/rauhanAhmed)  
- 💼 LinkedIn: [linkedin.com/in/rauhan-ahmed](https://www.linkedin.com/in/rauhan-ahmed)  
- 🐦 Twitter (X): [x.com/ahmed_rauh46040](https://x.com/ahmed_rauh46040)  
- 📧 Email: [rauhaan.siddiqui@gmail.com](mailto:rauhaan.siddiqui@gmail.com)  

Feel free to reach out for collaborations, AI research, or any inquiries! 🚀