krutrim-ai-labs
/

Krutrim-2-instruct

PyTorch

mistral

Krutrim

language-model

Model card Files Files and versions Community

krutrim-admin commited on 9 days ago

Commit

e05bfd4

verified ·

1 Parent(s): bbae435

Updated overview section

Browse files

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -26,9 +26,9 @@ widget:
 # Krutrim-2
 ## Model Overview
-Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is based on the Mistral-NeMo 12B architecture and has undergone continual pretraining with 500B tokens across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned on 1.5M data points covering a diverse range of tasks, including knowledge recall, math, reasoning, coding, safety & non-compliance, instruction following, creative writing, and role-playing.
-After fine-tuning, the model underwent Direct Preference Optimization (DPO) with 300K data points to enhance alignment across multiple aspects. DPO was applied to improve response helpfulness, safety, and compliance, making the model more robust against harmful prompts, reducing biases, and improving factual consistency.
 ## Key Features
 - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
@@ -50,8 +50,8 @@ After fine-tuning, the model underwent Direct Preference Optimization (DPO) with
 | Model Name | Release Date |Release Note | Reference|
 |------------|-------------|-------------|-------------|
-| Krutrim-2-Base-0131   | 2024-01-31  | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base-0131)|
-| Krutrim-2-Instruct-0131  | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct-0131)|
 ## Data Freshness
@@ -126,7 +126,7 @@ To use the model, you can load it with `AutoModelForCausalLM` as follows:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-model_id = "path/to/Krutrim-2_model"
 # Load model and tokenizer
 model = AutoModelForCausalLM.from_pretrained(model_id)

 # Krutrim-2
 ## Model Overview
+Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is built on the Mistral-NeMo 12B architecture and trained across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned on diverse data covering a wide range of tasks, including knowledge recall, math, reasoning, coding, safety & non-compliance, instruction following and creative writing.
+After fine-tuning, the model underwent Direct Preference Optimization (DPO) to enhance alignment across multiple aspects. DPO was applied to improve response helpfulness, safety, and compliance, making the model more robust against harmful prompts, reducing biases, and improving factual consistency.
 ## Key Features
 - 12B parameter dense transformer model leading to better generalization compared to Krutrim-1 7B;
 | Model Name | Release Date |Release Note | Reference|
 |------------|-------------|-------------|-------------|
+| Krutrim-2-Base-0131   | 2024-01-31  | Continually Pre-trained on MN12B base | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-base)|
+| Krutrim-2-Instruct-0131  | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-2-instruct)|
 ## Data Freshness
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+model_id = "krutrim-ai-labs/Krutrim-2-instruct"
 # Load model and tokenizer
 model = AutoModelForCausalLM.from_pretrained(model_id)