---
language:
- en
library_name: transformers
tags:
- llm
- llama
- llama-3
- grpo
- reinforcement-learning
- mathematical-reasoning
- logic
- deepseek
license: apache-2.0
datasets:
- openai/gsm8k
model_name: LLaMA-3.2-3B-GRPO-GSM325
base_model: meta-llama/Meta-Llama-3-3B
pipeline_tag: text-generation
---
# **LLaMA-3.2-3B-GRPO-GSM325**
🚀 **LLaMA-3.2-3B-GRPO-GSM325** is a fine-tuned version of **LLaMA 3.2B**, trained using **GRPO (Guided Reinforcement Policy Optimization)** and **DeepSeek R1’s open-source recipe**. This model significantly enhances the base **LLaMA-3.2-3B** in **mathematical problem-solving, logical reasoning, and structured response generation**, pushing it towards **GPT-4o1-style advanced reasoning**.
🔥 Trained **entirely on a Free Google Colab Tesla T4 GPU**: [Training Notebook](https://colab.research.google.com/drive/1o95CT5DV2zZXjScDHxKfRJBaNGv3ULpj?usp=sharing)
🚀 **With more resources and extended training, this model could be pushed even further!**
## **Model Details**
- **Base Model**: LLaMA 3.2B
- **Fine-tuning Method**: GRPO with structured reinforcement
- **Dataset**: 325 curated questions from GSM8K (math reasoning)
- **Format Adherence**: XML-based structured reasoning
- **Notable Improvements**:
- **Mathematical accuracy** ✔
- **Logical consistency** ✔
- **Structured XML-format responses** ✔
- **GPT-4o1-like step-by-step reasoning** ✔
---
## **Usage**
### **Example Input & Output**
#### **Input (User Query)**
```xml
If 2x+5=10. Solve for x.
```
#### **Output (Model Response)**
```xml
To solve for x, we need to isolate x on one side of the equation. This can be done by subtracting 5 from both sides of the equation.
2x + 5 - 5 = 10 - 5,
2x = 5,
2x / 2 = 5 / 2,
x = 2.5
```
---
## **Installation & Inference**
### **Hugging Face Transformers**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Rauhan/llama-3.2-3B-GRPO-GSM325"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```
### **Using vLLM for Fast Inference**
```python
from vllm import LLM, SamplingParams
llm = LLM(model="Rauhan/llama-3.2-3B-GRPO-GSM325")
sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
output = llm.generate(["\nA store sells apples...\n"], sampling_params)
print(output)
```
## **Limitations & Future Work**
🚧 **Limitations**:
- Limited by **small dataset size (325 questions)**
- Training done on **a single Free Google Colab Tesla T4 GPU**
- Some **long-form reasoning may need further fine-tuning**
🚀 **Future Improvements**:
- Training on **a larger dataset** (more GSM8K questions + other logical reasoning datasets)
- Extending fine-tuning using **DeepSeek R1’s full training pipeline**
- Further **quantization for faster and memory-efficient inference**
---
## **License & Citation**
This model is released under **Apache 2.0 License**. If you use this model in your research, please cite:
```
@misc{llama-3.2-3B-GRPO-GSM325,
author = {Rauhan},
title = {LLaMA-3.2-3B-GRPO-GSM325},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Rauhan/llama-3.2-3B-GRPO-GSM325}
}
```
---
🚀 **This model demonstrates how even small models can achieve great results with the right fine-tuning techniques!** 🚀
---
## **About the Author**
🔗 **Portfolio & Contact Information**:
- 🌍 Website: [rauhanahmed.org](https://rauhanahmed.org)
- 🏢 GitHub: [github.com/rauhanAhmed](https://github.com/rauhanAhmed)
- 💼 LinkedIn: [linkedin.com/in/rauhan-ahmed](https://www.linkedin.com/in/rauhan-ahmed)
- 🐦 Twitter (X): [x.com/ahmed_rauh46040](https://x.com/ahmed_rauh46040)
- 📧 Email: [rauhaan.siddiqui@gmail.com](mailto:rauhaan.siddiqui@gmail.com)
Feel free to reach out for collaborations, AI research, or any inquiries! 🚀