euclaise
/

ReMask-3B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

euclaise commited on Mar 28, 2024

Commit

0b86149

·

verified ·

1 Parent(s): febd59e

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -91,5 +91,12 @@ Keeping this in mind:
 I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
-I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking), and once with SFT.

 I trained StableLM-3B-4e1t repeatedly on [https://huggingface.co/datasets/euclaise/TinyCoT](TinyCoT), along with 1000 examples from [reddit-instruct-curated](https://huggingface.co/datasets/euclaise/reddit-instruct-curated) and 1000 examples from [oasst2-curated](https://huggingface.co/datasets/sablo/oasst2_curated).
+I trained once with ReMask (ReMask-CoT for CoT examples), once with Masked Thought (w/ partial label-masking for CoT), and once with SFT.
+Here are some benchmark results, computed using the the LM Evaluation Harness with vllm:
+| Model          | GSM8K (strict, 5-shot) | AGIEval (Nous subset, 0-shot) | ARC-C | BBH
+|:--------------:|-----------------------:|------------------------------:|------:|-----
+| SFT            | 23.81%                 |
+| Masked Thought | 20.24%                 | 23.80%
+| **ReMask**     | **24.03%**             | 24.71%