benjaminzwhite
/

Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit / README.md

benjaminzwhite's picture

Update README.md

8c0e6ec verified 8 days ago

|

history blame contribute delete

3.34 kB

	---
	library_name: transformers
	tags:
	- unsloth
	- reasoning
	- mathematics
	- math
	datasets:
	- openai/gsm8k
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-3B-Instruct
	---

	# Model Card for Model ID

	This is an early experiment using the `GRPOTrainer` and training reasoning models using the Unsloth library. It is not intended for real use, but it should work OK for simple prompt tests and easy mathematics questions.

	(You can run this using the code below on a free Colab/Kaggle basic GPU account for testing.)

	NOTE: If you are interested in reasoning models and research in this area, I maintain an up-to-date resource list here : [https://github.com/benjaminzwhite/reasoning-models](https://github.com/benjaminzwhite/reasoning-models)

	- Example query: `"What is the smallest prime number greater than 50 ?"`

	- Example response: `"<reasoning>\nTo find the smallest prime number greater than 50, we can start checking from 51 onwards for primality. A prime number is a number that has no divisors other than 1 and itself. We check each number to see if it's divisible by any number other than 1 and itself.\n</reasoning>\n<answer>\n53\n</answer>"`


	## How to Get Started with the Model

	To use this with standard HuggingFace code, I recommend starting with this code (based 95% on the default code shown at the base model page : [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct))

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "benjaminzwhite/Qwen2.5-3B-Instruct_GSM8K-GRPO_16bit"

	# model loading
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# system prompt used during training
	SYSTEM_PROMPT = """
	Respond in the following format:
	<reasoning>
	...
	</reasoning>
	<answer>
	...
	</answer>
	"""

	# your query goes here
	user_prompt = "What is the smallest prime number greater than 50 ?"

	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": user_prompt}
	]

	# default Qwen2.5 code from this point ...
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

	print(response)

	# sample answer obtained to my query, to show expected format
	# (note that the answer, 53, is correct here)
	"""
	"<reasoning>\nTo find the smallest prime number greater than 50, we can start checking from 51 onwards for primality. A prime number is a number that has no divisors other than 1 and itself. We check each number to see if it's divisible by any number other than 1 and itself.\n</reasoning>\n<answer>\n53\n</answer>"
	"""
	```

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	Trained on GSM8K mathematics dataset.