Model Card for Model ID

This is an early experiment using the GRPOTrainer and training reasoning models using the Unsloth library. It is not intended for real use, but it should work OK for simple prompt tests and easy mathematics questions.

(You can run this using the code below on a free Colab/Kaggle basic GPU account for testing.)

NOTE: If you are interested in reasoning models and research in this area, I maintain an up-to-date resource list here : https://github.com/benjaminzwhite/reasoning-models

Example query: "What is the smallest prime number greater than 50 ?"
Example response: "<reasoning>\nTo find the smallest prime number greater than 50, we can start checking from 51 onwards for primality. A prime number is a number that has no divisors other than 1 and itself. We check each number to see if it's divisible by any number other than 1 and itself.\n</reasoning>\n<answer>\n53\n</answer>"

How to Get Started with the Model

To use this with standard HuggingFace code, I recommend starting with this code (based 95% on the default code shown at the base model page : https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "benjaminzwhite/Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit"

# model loading
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# system prompt used during training
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

# your query goes here
user_prompt = "What is the smallest prime number greater than 50 ?"

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": user_prompt}
]

# default Qwen2.5 code from this point ...
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)

# sample answer obtained to my query, to show expected format
# (note that the answer, 53, is correct here)
"""
"<reasoning>\nTo find the smallest prime number greater than 50, we can start checking from 51 onwards for primality. A prime number is a number that has no divisors other than 1 and itself. We check each number to see if it's divisible by any number other than 1 and itself.\n</reasoning>\n<answer>\n53\n</answer>"
"""

Training Details

Training Data

Trained on GSM8K mathematics dataset.

benjaminzwhite
/

Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit

Model Card for Model ID

How to Get Started with the Model

Training Details

Training Data

Model tree for benjaminzwhite/Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit

Dataset used to train benjaminzwhite/Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit