migtissera commited on
Commit
f81c056
·
verified ·
1 Parent(s): 0523050

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -10
README.md CHANGED
@@ -12,17 +12,8 @@ model-index:
12
 
13
  <br>
14
 
15
- # Evaluations
16
-
17
- | | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
18
- |--------------|------------------|------------------|-------------|
19
- | GPQA | 41.5% | 41.6% | 40.2% |
20
- | MMLU | 81.6% | - | 82.0% |
21
- | MATH | 64.2% | 69.4% | 70.2% |
22
- | MMLU-Pro | 65.6% | 65.0% | - |
23
- | HumanEval | | 88.1% | 87.2% |
24
- | DROP (F1 Score) | | 83.1% | 79.7% |
25
 
 
26
 
27
  Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
28
 
@@ -36,6 +27,16 @@ The model is trained to first think step-by-step, and contemplate on its answers
36
  # Important Note:
37
  In a multi-turn conversation, only the contents between the `<output>` `</output>` tags (discarding the tags) should be carried forward. Otherwise the model will see out of distribution input data and will fail.
38
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  # Prompt Format
41
 
 
12
 
13
  <br>
14
 
 
 
 
 
 
 
 
 
 
 
15
 
16
+ # Introduction
17
 
18
  Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
19
 
 
27
  # Important Note:
28
  In a multi-turn conversation, only the contents between the `<output>` `</output>` tags (discarding the tags) should be carried forward. Otherwise the model will see out of distribution input data and will fail.
29
 
30
+ # Evaluations
31
+
32
+ | | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
33
+ |--------------|------------------|------------------|-------------|
34
+ | GPQA | 41.5% | 41.6% | 40.2% |
35
+ | MMLU | 81.6% | - | 82.0% |
36
+ | MATH | 64.2% | 69.4% | 70.2% |
37
+ | MMLU-Pro | 65.6% | 65.0% | - |
38
+ | HumanEval | | 88.1% | 87.2% |
39
+ | DROP (F1 Score) | | 83.1% | 79.7% |
40
 
41
  # Prompt Format
42