Update README.md
Browse files
README.md
CHANGED
@@ -50,8 +50,8 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
|
|
50 |
|
51 |
| Model Name | Release Date |Release Note | Reference|
|
52 |
|------------|-------------|-------------|-------------|
|
53 |
-
| Krutrim-2-Base
|
54 |
-
| Krutrim-2-Instruct
|
55 |
|
56 |
|
57 |
## Data Freshness
|
@@ -91,15 +91,19 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
|
|
91 |
|
92 |
### Indic Benchmarks
|
93 |
|
94 |
-
|
|
95 |
-
|
96 |
-
| IndicSentiment (0-shot)
|
97 |
-
| IndicCOPA (0-shot)
|
98 |
-
| IndicXParaphrase (0-shot)
|
99 |
-
| IndicXNLI (
|
100 |
-
|
|
101 |
-
|
|
102 |
-
| FloresIN (1-shot
|
|
|
|
|
|
|
|
|
103 |
|
104 |
### BharatBench
|
105 |
The existing Indic benchmarks are not natively in Indian languages, rather, they are translations of existing En benchmarks. They do not sufficiently capture the linguistic nuances of Indian languages and aspects of Indian culture. Towards that Krutrim released BharatBench - a natively Indic benchmark that encompasses the linguistic and cultural diversity of the Indic region, ensuring that the evaluations are relevant and representative of real-world use cases in India.
|
|
|
50 |
|
51 |
| Model Name | Release Date |Release Note | Reference|
|
52 |
|------------|-------------|-------------|-------------|
|
53 |
+
| Krutrim-2-Base | 2024-01-31 | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
|
54 |
+
| Krutrim-2-Instruct | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
|
55 |
|
56 |
|
57 |
## Data Freshness
|
|
|
91 |
|
92 |
### Indic Benchmarks
|
93 |
|
94 |
+
| Benchmark | Metric | Krutrim-1 7B | MN-12B-Instruct | Krutrim-2 12B | llama-3.1-8B | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
|
95 |
+
|--------------------------------------------|------------|--------------|----------------|--------------|--------------|--------------|----------------|--------|
|
96 |
+
| IndicSentiment (0-shot) | Accuracy | 0.65 | 0.70 | 0.95 | 0.05 | 0.96 | 0.99 | 0.98 |
|
97 |
+
| IndicCOPA (0-shot) | Accuracy | 0.51 | 0.58 | 0.80 | 0.48 | 0.83 | 0.88 | 0.91 |
|
98 |
+
| IndicXParaphrase (0-shot) | Accuracy | 0.67 | 0.74 | 0.88 | 0.75 | 0.87 | 0.89 | TBD |
|
99 |
+
| IndicXNLI (0-shot) | Accuracy | 0.47 | 0.54 | 0.55 | 0.00 | TBD | TBD | 0.67? |
|
100 |
+
| IndicQA (0-shot) | Bert Score | 0.90 | 0.90 | 0.91 | TBD | TBD | TBD | TBD |
|
101 |
+
| CrossSumIN (1-shot) | chrF++ | 0.04 | 0.17 | 0.21 | 0.21 | 0.26 | 0.24 | TBD |
|
102 |
+
| FloresIN Translation xx-en (1-shot) | chrF++ | 0.54 | 0.50 | 0.58 | 0.54 | 0.60 | 0.62 | 0.63 |
|
103 |
+
| FloresIN Translation en-xx (1-shot) | chrF++ | 0.41 | 0.34 | 0.48 | 0.37 | 0.46 | 0.47 | 0.48 |
|
104 |
+
| IN22 Translation xx-en (0-shot) | chrF++ | 0.50 | 0.48 | 0.57 | 0.49 | 0.58 | TBD | 0.54? |
|
105 |
+
| IN22 Translation en-xx (0-shot) | chrF++ | 0.36 | 0.33 | 0.45 | 0.32 | 0.42 | TBD | 0.43? |
|
106 |
+
|
107 |
|
108 |
### BharatBench
|
109 |
The existing Indic benchmarks are not natively in Indian languages, rather, they are translations of existing En benchmarks. They do not sufficiently capture the linguistic nuances of Indian languages and aspects of Indian culture. Towards that Krutrim released BharatBench - a natively Indic benchmark that encompasses the linguistic and cultural diversity of the Indic region, ensuring that the evaluations are relevant and representative of real-world use cases in India.
|