File size: 2,280 Bytes
8dadd90 7b8b09c 8dadd90 7b8b09c 8dadd90 7b8b09c 8dadd90 bb9960a 8dadd90 6a8a5d8 8dadd90 8b06358 8dadd90 91451d9 8dadd90 891b7ec 8dadd90 fbb3443 8dadd90 52d788b fbb3443 52d788b 8dadd90 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
---
language:
- en
license: apache-2.0
tags:
- text-generation-inference
- transformers
- unsloth
- llama
- trl
- sft
---
<style>
img{
border-radius: 1rem;
}
@import url('https://fonts.googleapis.com/css2?family=Vollkorn:ital,wght@0,400..900;1,400..900&display=swap');
</style>
<div style="background-color: transparent; border-radius: .5rem; padding: 2rem; font-family: monospace; font-size: .85rem; text-align: justify;">
![palmer-004](https://huggingface.co/appvoid/palmer-004-original/resolve/main/palmer-004.jpeg)
#### palmer turbo
This model has a slightly different architecture and training style:
1. The model was followed by a continual pretraining (lm_head + embedding layers were tuned).
2. Base model was pretrained on 75k instruction/response pairs and merged.
3. Similar architecture than palmer series but smaller in context size (8192)
In short, palmer is now half the size, twice the speed and almost same overall performance with a notable improvement on mmlu and arc challenge instead of winogrande. As of Wed 17 Jul, it beats all models =< 0.5b on hellaswag.
As all palmer models, the model is biased to respond to answers without using any specific prompt, feel free to further fine-tune it for your specific use case.
#### benchmarks
These are zero-shot evaluations performed on current state-of-the-art language models.
| Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average |
|--------------------------------|-------|-------|-----------|--------|------------|---------|
| smollm-360m | 0.2537|**0.3626**| 0.5350 | 0.7116 | 0.5659 | 0.4858 |
| tinyllama | 0.2577| 0.3029| 0.5935 | 0.7329 | 0.5959 | 0.4966 |
| qwen2-0.5b |**0.4413**| 0.2892| 0.4905 | 0.6931 | 0.5699 | 0.4968 |
| danube3-500m-chat (current sota)| 0.2554|**0.3626**|0.6072 | 0.7432 | 0.6140 | 0.5164 |
| palmer-004-turbo |0.2736|0.3558|**0.6179**|0.7367 | 0.6117 |**0.5191**|
| palmer-004 | 0.2661| 0.3490| 0.6173 |**0.7481**|**0.6417** |**0.5244**|
#### thanks to
- h2oai: performant base model provider
- teknium: openhermes dataset provider
- unsloth: tooling for training software
</div>
|