--- language: - en library_name: transformers tags: - llm - llama - llama-3 - grpo - reinforcement-learning - mathematical-reasoning - logic - deepseek license: apache-2.0 datasets: - openai/gsm8k model_name: LLaMA-3.2-3B-GRPO-GSM325 base_model: meta-llama/Meta-Llama-3-3B pipeline_tag: text-generation --- # **LLaMA-3.2-3B-GRPO-GSM325** 🚀 **LLaMA-3.2-3B-GRPO-GSM325** is a fine-tuned version of **LLaMA 3.2B**, trained using **GRPO (Guided Reinforcement Policy Optimization)** and **DeepSeek R1’s open-source recipe**. This model significantly enhances the base **LLaMA-3.2-3B** in **mathematical problem-solving, logical reasoning, and structured response generation**, pushing it towards **GPT-4o1-style advanced reasoning**. 🔥 Trained **entirely on a Free Google Colab Tesla T4 GPU**: [Training Notebook](https://colab.research.google.com/drive/1o95CT5DV2zZXjScDHxKfRJBaNGv3ULpj?usp=sharing) 🚀 **With more resources and extended training, this model could be pushed even further!** ## **Model Details** - **Base Model**: LLaMA 3.2B - **Fine-tuning Method**: GRPO with structured reinforcement - **Dataset**: 325 curated questions from GSM8K (math reasoning) - **Format Adherence**: XML-based structured reasoning - **Notable Improvements**: - **Mathematical accuracy** ✔ - **Logical consistency** ✔ - **Structured XML-format responses** ✔ - **GPT-4o1-like step-by-step reasoning** ✔ --- ## **Usage** ### **Example Input & Output** #### **Input (User Query)** ```xml If 2x+5=10. Solve for x. ``` #### **Output (Model Response)** ```xml To solve for x, we need to isolate x on one side of the equation. This can be done by subtracting 5 from both sides of the equation. 2x + 5 - 5 = 10 - 5, 2x = 5, 2x / 2 = 5 / 2, x = 2.5 ``` --- ## **Installation & Inference** ### **Hugging Face Transformers** ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "Rauhan/llama-3.2-3B-GRPO-GSM325" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id) ``` ### **Using vLLM for Fast Inference** ```python from vllm import LLM, SamplingParams llm = LLM(model="Rauhan/llama-3.2-3B-GRPO-GSM325") sampling_params = SamplingParams(temperature=0.7, max_tokens=256) output = llm.generate(["\nA store sells apples...\n"], sampling_params) print(output) ``` ## **Limitations & Future Work** 🚧 **Limitations**: - Limited by **small dataset size (325 questions)** - Training done on **a single Free Google Colab Tesla T4 GPU** - Some **long-form reasoning may need further fine-tuning** 🚀 **Future Improvements**: - Training on **a larger dataset** (more GSM8K questions + other logical reasoning datasets) - Extending fine-tuning using **DeepSeek R1’s full training pipeline** - Further **quantization for faster and memory-efficient inference** --- ## **License & Citation** This model is released under **Apache 2.0 License**. If you use this model in your research, please cite: ``` @misc{llama-3.2-3B-GRPO-GSM325, author = {Rauhan}, title = {LLaMA-3.2-3B-GRPO-GSM325}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/Rauhan/llama-3.2-3B-GRPO-GSM325} } ``` --- 🚀 **This model demonstrates how even small models can achieve great results with the right fine-tuning techniques!** 🚀 --- ## **About the Author** 🔗 **Portfolio & Contact Information**: - 🌍 Website: [rauhanahmed.org](https://rauhanahmed.org) - 🏢 GitHub: [github.com/rauhanAhmed](https://github.com/rauhanAhmed) - 💼 LinkedIn: [linkedin.com/in/rauhan-ahmed](https://www.linkedin.com/in/rauhan-ahmed) - 🐦 Twitter (X): [x.com/ahmed_rauh46040](https://x.com/ahmed_rauh46040) - 📧 Email: [rauhaan.siddiqui@gmail.com](mailto:rauhaan.siddiqui@gmail.com) Feel free to reach out for collaborations, AI research, or any inquiries! 🚀