Needle in a Haystack Evaluation Heatmap
Model Card for Model ID
merge between:
- DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 - 66%
- meta-llama/Meta-Llama-3-8B-Instruct - 16%
- DataGuard/pali-8B-v0.4.3 - 16%
Embedding, norm and head layers come from DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 without changes
- Downloads last month
- 11
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Evaluation results
- judge_match on squad_answerableself-reported0.639
- judge_match on context_has_answerself-reported0.86
- judge_match on jail_breakself-reported0.099
- judge_match on harmless_promptself-reported0.926
- judge_match on harmful_promptself-reported0.689
- acc on truthfulqaself-reported0.522
- exact_match on gsm8kself-reported0.616
- acc on mmluself-reported0.634