PyTorch
mistral
Krutrim
language-model
krutrim-admin commited on
Commit
e05bfd4
·
verified ·
1 Parent(s): bbae435

Updated overview section

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -26,9 +26,9 @@ widget:
26
  # Krutrim-2
27
 
28
  ## Model Overview
29
- Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is based on the Mistral-NeMo 12B architecture and has undergone continual pretraining with 500B tokens across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned on 1.5M data points covering a diverse range of tasks, including knowledge recall, math, reasoning, coding, safety & non-compliance, instruction following, creative writing, and role-playing.
30
 
31
- After fine-tuning, the model underwent Direct Preference Optimization (DPO) with 300K data points to enhance alignment across multiple aspects. DPO was applied to improve response helpfulness, safety, and compliance, making the model more robust against harmful prompts, reducing biases, and improving factual consistency.
32
 
33
  ## Key Features
34
  - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
@@ -50,8 +50,8 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) with
50
 
51
  | Model Name | Release Date |Release Note | Reference|
52
  |------------|-------------|-------------|-------------|
53
- | Krutrim-2-Base-0131 | 2024-01-31 | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base-0131)|
54
- | Krutrim-2-Instruct-0131 | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct-0131)|
55
 
56
 
57
  ## Data Freshness
@@ -126,7 +126,7 @@ To use the model, you can load it with `AutoModelForCausalLM` as follows:
126
  from transformers import AutoModelForCausalLM, AutoTokenizer
127
  import torch
128
 
129
- model_id = "path/to/Krutrim-2_model"
130
 
131
  # Load model and tokenizer
132
  model = AutoModelForCausalLM.from_pretrained(model_id)
 
26
  # Krutrim-2
27
 
28
  ## Model Overview
29
+ Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is built on the Mistral-NeMo 12B architecture and trained across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned on diverse data covering a wide range of tasks, including knowledge recall, math, reasoning, coding, safety & non-compliance, instruction following and creative writing.
30
 
31
+ After fine-tuning, the model underwent Direct Preference Optimization (DPO) to enhance alignment across multiple aspects. DPO was applied to improve response helpfulness, safety, and compliance, making the model more robust against harmful prompts, reducing biases, and improving factual consistency.
32
 
33
  ## Key Features
34
  - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
 
50
 
51
  | Model Name | Release Date |Release Note | Reference|
52
  |------------|-------------|-------------|-------------|
53
+ | Krutrim-2-Base-0131 | 2024-01-31 | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
54
+ | Krutrim-2-Instruct-0131 | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
55
 
56
 
57
  ## Data Freshness
 
126
  from transformers import AutoModelForCausalLM, AutoTokenizer
127
  import torch
128
 
129
+ model_id = "krutrim-ai-labs/Krutrim-2-instruct"
130
 
131
  # Load model and tokenizer
132
  model = AutoModelForCausalLM.from_pretrained(model_id)