Update README.md
Browse files
README.md
CHANGED
@@ -91,5 +91,12 @@ Keeping this in mind:
|
|
91 |
|
92 |
I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
|
93 |
|
94 |
-
I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking), and once with SFT.
|
95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
91 |
|
92 |
I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
|
93 |
|
94 |
+
I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking for CoT), and once with SFT.
|
95 |
|
96 |
+
Here are some benchmark results, computed using the the LM Evaluation Harness with vllm:
|
97 |
+
|
98 |
+
| Model | GSM8K (strict, 5-shot) | AGIEval (Nous subset, 0-shot) | ARC-C | BBH
|
99 |
+
|:--------------:|-----------------------:|------------------------------:|------:|-----
|
100 |
+
| SFT | 23.81% |
|
101 |
+
| Masked Thought | 20.24% | 23.80%
|
102 |
+
| **ReMask** | **24.03%** | 24.71%
|