tokyotech-llm
/

Llama-3.1-Swallow-70B-Instruct-v0.1

@@ -94,24 +94,16 @@ The website [https://swallow-llm.github.io/](https://swallow-llm.github.io/) pro
 |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
 |---|---|---|---|---|---|---|---|---|---|
 | Model | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | JMT Avg |
-| Gemma 2 27B IT | 0.5467 | 0.6752 | 0.8386 | 0.6246 | 0.7201 | 0.7916 | 0.6787 | 0.807 | 0.7103 |
-| Phi-3.5-MoE Instruct | 0.5214 | 0.8106 | 0.647 | 0.4415 | 0.536 | 0.6712 | 0.5314 | 0.7304 | 0.6112 |
-| GRIN-MoE | 0.5294 | 0.7224 | 0.5923 | 0.5467 | 0.499 | 0.603 | 0.538 | 0.6839 | 0.5893 |
-| KARAKURI LM 70B Chat v0.1 | 0.2804 | 0.5862 | 0.624 | 0.2934 | 0.4183 | 0.553 | 0.4859 | 0.5964 | 0.4797 |
-| Swallow-70b-instruct-v0.1 | 0.303 | 0.55 | 0.565 | 0.3483 | 0.305 | 0.542 | 0.4916 | 0.463 | 0.446 |
-| Llama 3 70B Instruct | 0.5969 | 0.841 | 0.712 | 0.4481 | 0.4884 | 0.7117 | 0.651 | 0.69 | 0.6424 |
 | Llama 3.1 70B Instruct | 0.5252 | 0.7846 | 0.7086 | 0.5063 | 0.6979 | 0.6888 | 0.6402 | 0.6653 | 0.6521 |
 | Llama 3 Youko 70B Instruct | 0.6632 | 0.8387 | 0.8108 | 0.4655 | 0.7013 | 0.7778 | 0.7544 | 0.7662 | 0.7222 |
-| Llama-3.1-70B-Japanese-Instruct-2407 | 0.6267 | 0.7525 | 0.7938 | 0.575 | 0.559 | 0.7725 | 0.724 | 0.718 | 0.6902 |
-| Llama 3 heron brain 70B v0.3 | 0.3762 | 0.7892 | 0.7274 | 0.5589 | 0.507 | 0.6662 | 0.688 | 0.6996 | 0.6266 |
-| Llama 3 Swallow 70B Instruct | 0.5269 | 0.725 | 0.569 | 0.4669 | 0.6121 | 0.6238 | 0.5533 | 0.5698 | 0.5809 |
-| Llama 3.1 Swallow 70B Instruct | 0.5676 | 0.7859 | 0.749 | 0.5437 | 0.6383 | 0.687 | 0.6121 | 0.654 | 0.6547 |
-| Qwen2-72B-Instruct | 0.5699 | 0.7858 | 0.8222 | 0.5096 | 0.7032 | 0.7963 | 0.7728 | 0.8223 | 0.7228 |
-| Qwen2.5-72B-Instruct | 0.706 | 0.7866 | 0.8122 | 0.6968 | 0.6536 | 0.8301 | 0.806 | 0.7841 | 0.7594 |
-| Mixtral-8x22B-Instruct-v0.1 | 0.5061 | 0.7454 | 0.5978 | 0.4772 | 0.476 | 0.542 | 0.4679 | 0.6244 | 0.5546 |
-| Llama 3.1 405B Instruct (deepinfra API) | 0.6464 | 0.8218 | 0.715 | 0.5313 | 0.6447 | 0.716 | 0.6737 | 0.677 | 0.6782 |
 | GPT-3.5 (gpt-3.5-turbo-0125) | 0.6851 | 0.7641 | 0.7414 | 0.5522 | 0.5128 | 0.7104 | 0.6266 | 0.7361 | 0.6661 |
-| GPT-4o (gpt-4o-2024-05-13) | 0.7296 | 0.854 | 0.8646 | 0.6641 | 0.6661 | 0.8274 | 0.8184 | 0.8085 | 0.7791 |
 ## Evaluation Benchmarks

 |Model|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg|
 |---|---|---|---|---|---|---|---|---|---|
 | Model | coding | extraction | humanities | math | reasoning | roleplay | stem | writing | JMT Avg |
+| Qwen2-72B-Instruct | 0.5699 | 0.7858 | 0.8222 | 0.5096 | 0.7032 | 0.7963 | 0.7728 | 0.8223 | 0.7228 |
+| Qwen2.5-72B-Instruct | 0.7060 | 0.7866 | 0.8122 | 0.6968 | 0.6536 | 0.8301 | 0.8060 | 0.7841 | 0.7594 |
+| Llama 3 70B Instruct | 0.5969 | 0.8410 | 0.7120 | 0.4481 | 0.4884 | 0.7117 | 0.6510 | 0.6900 | 0.6424 |
 | Llama 3.1 70B Instruct | 0.5252 | 0.7846 | 0.7086 | 0.5063 | 0.6979 | 0.6888 | 0.6402 | 0.6653 | 0.6521 |
 | Llama 3 Youko 70B Instruct | 0.6632 | 0.8387 | 0.8108 | 0.4655 | 0.7013 | 0.7778 | 0.7544 | 0.7662 | 0.7222 |
+| Llama 3 heron brain 70B v0.3 | 0.3762 | 0.7892 | 0.7274 | 0.5589 | 0.5070 | 0.6662 | 0.6880 | 0.6996 | 0.6266 |
+| Llama 3 Swallow 70B Instruct | 0.5269 | 0.7250 | 0.5690 | 0.4669 | 0.6121 | 0.6238 | 0.5533 | 0.5698 | 0.5809 |
+| Llama 3.1 Swallow 70B Instruct | 0.5676 | 0.7859 | 0.7490 | 0.5437 | 0.6383 | 0.6870 | 0.6121 | 0.6540 | 0.6547 |
 | GPT-3.5 (gpt-3.5-turbo-0125) | 0.6851 | 0.7641 | 0.7414 | 0.5522 | 0.5128 | 0.7104 | 0.6266 | 0.7361 | 0.6661 |
+| GPT-4o (gpt-4o-2024-05-13) | 0.7296 | 0.8540 | 0.8646 | 0.6641 | 0.6661 | 0.8274 | 0.8184 | 0.8085 | 0.7791 |
 ## Evaluation Benchmarks