Nagi-ovo
/

Qwen2.5-7B-Reasoning-Adapter

Text Generation

Model card Files Files and versions Community

Nagi-ovo commited on 13 days ago

Commit

6cdfc78

·

verified ·

1 Parent(s): 43fcc94

Update README.md

Files changed (1) hide show

README.md +19 -5

README.md CHANGED Viewed

@@ -17,12 +17,26 @@ pipeline_tag: text-generation
 library_name: peft
 ---
-# Uploaded  model
-- **Developed by:** Nagi-ovo
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 library_name: peft
 ---
+This model uses reinforcement learning to train on the GSM8K dataset, generating reasoning chains and formatted outputs despite the dataset lacking intermediate steps.  A reward function guides the model, prioritizing answer correctness and XML format adherence.
+**Training Details:**
+* Dataset: GSM8K
+* Algorithm: GRPO
+* Hardware: Single NVIDIA GeForce RTX 3090 Ti
+* Training Duration: 250 epochs, ~48 minutes
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/r8Fz5cQtx38wcoZLDKQ_0.png)
+**Limitations:**
+The output length limit(200) restricts the model's ability to generate complex reasoning chains, hindering observation of output length growth during training.
+**Example:**
+Which one is bigger? 9.11 or 9.8?
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/gbfcQXMLOn-n_CsbSVpy7.png)
+This qwen2.5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)