|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- cerebras/SlimPajama-627B |
|
language: |
|
- en |
|
tags: |
|
- cramped |
|
- cramp |
|
--- |
|
|
|
A modified GPT-2 model with ScaledSinusoidal position embeddings, no biases, embedding layernorm, and one shared MLP layer, with 94 million non-embedding params, that beats most similarly sized and slightly larger models (GPT-2-124m, Pythia-70/160m, Cerebras-111m) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) suite of benchmarks. All while only being trained on 8 billion tokens of text from [SlimPajama](https://hf.co/datasets/cerebras/SlimPajama-627B). |
|
|
|
You have to `pip install einops` before using this model! |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6079949388160e14e4e2e499/6wEIgFWUcAdA7hSaXW0x5.png) |
|
|
|
| avg | arc | hellaswag | mmlu | truthfulqa | |
|
| --- | --- | --- | --- | --- | |
|
| 30.76 | 22.18 | 29.75 | 26.24 | 44.88 | |