Text Generation
Transformers
Safetensors
English
stablelm
conversational
Inference Endpoints
euclaise commited on
Commit
0b86149
·
verified ·
1 Parent(s): febd59e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -91,5 +91,12 @@ Keeping this in mind:
91
 
92
  I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
93
 
94
- I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking), and once with SFT.
95
 
 
 
 
 
 
 
 
 
91
 
92
  I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
93
 
94
+ I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking for CoT), and once with SFT.
95
 
96
+ Here are some benchmark results, computed using the the LM Evaluation Harness with vllm:
97
+
98
+ | Model | GSM8K (strict, 5-shot) | AGIEval (Nous subset, 0-shot) | ARC-C | BBH
99
+ |:--------------:|-----------------------:|------------------------------:|------:|-----
100
+ | SFT | 23.81% |
101
+ | Masked Thought | 20.24% | 23.80%
102
+ | **ReMask** | **24.03%** | 24.71%