Nagi-ovo commited on
Commit
6cdfc78
·
verified ·
1 Parent(s): 43fcc94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -5
README.md CHANGED
@@ -17,12 +17,26 @@ pipeline_tag: text-generation
17
  library_name: peft
18
  ---
19
 
20
- # Uploaded model
21
 
22
- - **Developed by:** Nagi-ovo
23
- - **License:** apache-2.0
24
- - **Finetuned from model :** unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit
25
 
26
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
17
  library_name: peft
18
  ---
19
 
20
+ This model uses reinforcement learning to train on the GSM8K dataset, generating reasoning chains and formatted outputs despite the dataset lacking intermediate steps. A reward function guides the model, prioritizing answer correctness and XML format adherence.
21
 
22
+ **Training Details:**
 
 
23
 
24
+ * Dataset: GSM8K
25
+ * Algorithm: GRPO
26
+ * Hardware: Single NVIDIA GeForce RTX 3090 Ti
27
+ * Training Duration: 250 epochs, ~48 minutes
28
+
29
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/r8Fz5cQtx38wcoZLDKQ_0.png)
30
+ **Limitations:**
31
+
32
+ The output length limit(200) restricts the model's ability to generate complex reasoning chains, hindering observation of output length growth during training.
33
+
34
+ **Example:**
35
+
36
+ Which one is bigger? 9.11 or 9.8?
37
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b36c0a26893eb6a6e63da3/gbfcQXMLOn-n_CsbSVpy7.png)
38
+
39
+
40
+ This qwen2.5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
41
 
42
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)