crumb
/

cramped-94m-8btok

Text Generation

Model card Files Files and versions Community

cramped-94m-8btok / README.md

crumb's picture

Update README.md

f163a10 over 1 year ago

|

history blame contribute delete

912 Bytes

	---
	license: apache-2.0
	datasets:
	- cerebras/SlimPajama-627B
	language:
	- en
	tags:
	- cramped
	- cramp
	---

	A modified GPT-2 model with ScaledSinusoidal position embeddings, no biases, embedding layernorm, and one shared MLP layer, with 94 million non-embedding params, that beats most similarly sized and slightly larger models (GPT-2-124m, Pythia-70/160m, Cerebras-111m) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) suite of benchmarks. All while only being trained on 8 billion tokens of text from [SlimPajama](https://hf.co/datasets/cerebras/SlimPajama-627B).

	You have to `pip install einops` before using this model!

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6079949388160e14e4e2e499/6wEIgFWUcAdA7hSaXW0x5.png)

	\| avg \| arc \| hellaswag \| mmlu \| truthfulqa \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| 30.76 \| 22.18 \| 29.75 \| 26.24 \| 44.88 \|