PyTorch
mistral
Krutrim
language-model
krutrim-admin commited on
Commit
43cacce
·
verified ·
1 Parent(s): e05bfd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -11
README.md CHANGED
@@ -50,8 +50,8 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
50
 
51
  | Model Name | Release Date |Release Note | Reference|
52
  |------------|-------------|-------------|-------------|
53
- | Krutrim-2-Base-0131 | 2024-01-31 | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
54
- | Krutrim-2-Instruct-0131 | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
55
 
56
 
57
  ## Data Freshness
@@ -91,15 +91,19 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to e
91
 
92
  ### Indic Benchmarks
93
 
94
- | Dataset | Mistral-Nemo-Instruct-2407 | Krutrim-1 | Krutrim-2-Instruct-0131 |
95
- |-----------------------------------------|----------------------------|--------------------|-------------|
96
- | IndicSentiment (0-shot) | 70% | 65% | 95% |
97
- | IndicCOPA (0-shot) | 58% | 51% | 80% |
98
- | IndicXParaphrase (0-shot) | 74% | 67% | 88% |
99
- | IndicXNLI (3-shot) | 52% | 17% | 58% |
100
- | CrossSumIN (1-shot) (chrf++) | 17% | 4% | 21% |
101
- | FloresIN (1-shot, xx-en) (chrf++) | 50% | 54% | 58% |
102
- | FloresIN (1-shot, en-xx) (chrf++) | 34% | 41% | 46% |
 
 
 
 
103
 
104
  ### BharatBench
105
  The existing Indic benchmarks are not natively in Indian languages, rather, they are translations of existing En benchmarks. They do not sufficiently capture the linguistic nuances of Indian languages and aspects of Indian culture. Towards that Krutrim released BharatBench - a natively Indic benchmark that encompasses the linguistic and cultural diversity of the Indic region, ensuring that the evaluations are relevant and representative of real-world use cases in India.
 
50
 
51
  | Model Name | Release Date |Release Note | Reference|
52
  |------------|-------------|-------------|-------------|
53
+ | Krutrim-2-Base | 2024-01-31 | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
54
+ | Krutrim-2-Instruct | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
55
 
56
 
57
  ## Data Freshness
 
91
 
92
  ### Indic Benchmarks
93
 
94
+ | Benchmark | Metric | Krutrim-1 7B | MN-12B-Instruct | Krutrim-2 12B | llama-3.1-8B | llama-3.3-70B | Gemini-1.5 Flash | GPT-4o |
95
+ |--------------------------------------------|------------|--------------|----------------|--------------|--------------|--------------|----------------|--------|
96
+ | IndicSentiment (0-shot) | Accuracy | 0.65 | 0.70 | 0.95 | 0.05 | 0.96 | 0.99 | 0.98 |
97
+ | IndicCOPA (0-shot) | Accuracy | 0.51 | 0.58 | 0.80 | 0.48 | 0.83 | 0.88 | 0.91 |
98
+ | IndicXParaphrase (0-shot) | Accuracy | 0.67 | 0.74 | 0.88 | 0.75 | 0.87 | 0.89 | TBD |
99
+ | IndicXNLI (0-shot) | Accuracy | 0.47 | 0.54 | 0.55 | 0.00 | TBD | TBD | 0.67? |
100
+ | IndicQA (0-shot) | Bert Score | 0.90 | 0.90 | 0.91 | TBD | TBD | TBD | TBD |
101
+ | CrossSumIN (1-shot) | chrF++ | 0.04 | 0.17 | 0.21 | 0.21 | 0.26 | 0.24 | TBD |
102
+ | FloresIN Translation xx-en (1-shot) | chrF++ | 0.54 | 0.50 | 0.58 | 0.54 | 0.60 | 0.62 | 0.63 |
103
+ | FloresIN Translation en-xx (1-shot) | chrF++ | 0.41 | 0.34 | 0.48 | 0.37 | 0.46 | 0.47 | 0.48 |
104
+ | IN22 Translation xx-en (0-shot) | chrF++ | 0.50 | 0.48 | 0.57 | 0.49 | 0.58 | TBD | 0.54? |
105
+ | IN22 Translation en-xx (0-shot) | chrF++ | 0.36 | 0.33 | 0.45 | 0.32 | 0.42 | TBD | 0.43? |
106
+
107
 
108
  ### BharatBench
109
  The existing Indic benchmarks are not natively in Indian languages, rather, they are translations of existing En benchmarks. They do not sufficiently capture the linguistic nuances of Indian languages and aspects of Indian culture. Towards that Krutrim released BharatBench - a natively Indic benchmark that encompasses the linguistic and cultural diversity of the Indic region, ensuring that the evaluations are relevant and representative of real-world use cases in India.