OLMo-1B-0724 SFT
OLMo-1B-0724-hf finetuned for 5 epochs with a learning rate of 1e-5 on the Tulu 2 dataset - specifically this version. I used a batch size of 1, 128 grad accumulation steps. Linear warmup for the first 3% of training then linear decay to 0.
I've additionally released an 'instruct' version which has additionally gone through DPO training. This model is generally more performant (see the metrics below), so check it out!
Evals are as follows:
Metric | OLMo-1B-0724-hf | OLMo-1B-0724-SFT-hf (this model!) | OLMo-1B-0724-Instruct-hf |
---|---|---|---|
MMLU 0-shot | 25.0 | 36.0 | 36.7 |
GSM8k CoT 8-shot | 7.0 | 12.5 | 12.5 |
BBH CoT 3-shot | 22.5 | 27.2 | 30.6 |
HumanEval P@10 | 16.0 | 21.2 | 22.0 |
AlpacaEval 1 | - | 41.5 | 50.9 |
AlpacaEval 2 LC | - | 2.7 | 2.5 |
Toxigen % Toxic | 80.3 | 59.7 | 14.1 |
TruthfulQA %Info+True | 23.0 | 40.9 | 42.2 |
IFEval Loose Acc | 20.5 | 26.1 | 24.2 |
XSTest F1 | 67.6 | 81.9 | 79.8 |
Average of above metrics | 25.2 | 33.0 | 38.7 |
Model training and evaluation was performed using Open-instruct, so check that out for more details on evaluation.
- Downloads last month
- 116
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.