Update README.md
Browse files
README.md
CHANGED
@@ -12,17 +12,8 @@ model-index:
|
|
12 |
|
13 |
<br>
|
14 |
|
15 |
-
# Evaluations
|
16 |
-
|
17 |
-
| | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
|
18 |
-
|--------------|------------------|------------------|-------------|
|
19 |
-
| GPQA | 41.5% | 41.6% | 40.2% |
|
20 |
-
| MMLU | 81.6% | - | 82.0% |
|
21 |
-
| MATH | 64.2% | 69.4% | 70.2% |
|
22 |
-
| MMLU-Pro | 65.6% | 65.0% | - |
|
23 |
-
| HumanEval | | 88.1% | 87.2% |
|
24 |
-
| DROP (F1 Score) | | 83.1% | 79.7% |
|
25 |
|
|
|
26 |
|
27 |
Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
|
28 |
|
@@ -36,6 +27,16 @@ The model is trained to first think step-by-step, and contemplate on its answers
|
|
36 |
# Important Note:
|
37 |
In a multi-turn conversation, only the contents between the `<output>` `</output>` tags (discarding the tags) should be carried forward. Otherwise the model will see out of distribution input data and will fail.
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
# Prompt Format
|
41 |
|
|
|
12 |
|
13 |
<br>
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
+
# Introduction
|
17 |
|
18 |
Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
|
19 |
|
|
|
27 |
# Important Note:
|
28 |
In a multi-turn conversation, only the contents between the `<output>` `</output>` tags (discarding the tags) should be carried forward. Otherwise the model will see out of distribution input data and will fail.
|
29 |
|
30 |
+
# Evaluations
|
31 |
+
|
32 |
+
| | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
|
33 |
+
|--------------|------------------|------------------|-------------|
|
34 |
+
| GPQA | 41.5% | 41.6% | 40.2% |
|
35 |
+
| MMLU | 81.6% | - | 82.0% |
|
36 |
+
| MATH | 64.2% | 69.4% | 70.2% |
|
37 |
+
| MMLU-Pro | 65.6% | 65.0% | - |
|
38 |
+
| HumanEval | | 88.1% | 87.2% |
|
39 |
+
| DROP (F1 Score) | | 83.1% | 79.7% |
|
40 |
|
41 |
# Prompt Format
|
42 |
|