Update README.md
Browse files
README.md
CHANGED
@@ -27,10 +27,10 @@ tags:
|
|
27 |
This model has a slightly different architecture and training style:
|
28 |
|
29 |
1. The model was followed by a continual pretraining (lm_head + embedding layers were tuned).
|
30 |
-
2. Base model was
|
31 |
3. Similar architecture than palmer series but smaller in context size (8192)
|
32 |
|
33 |
-
In short, palmer is now half the size, twice the speed and same overall performance with a
|
34 |
|
35 |
As all palmer models, the model is biased to respond to answers without using any specific prompt, feel free to further fine-tune it for your specific use case.
|
36 |
|
|
|
27 |
This model has a slightly different architecture and training style:
|
28 |
|
29 |
1. The model was followed by a continual pretraining (lm_head + embedding layers were tuned).
|
30 |
+
2. Base model was pretrained on 75k instruction/response pairs and merged.
|
31 |
3. Similar architecture than palmer series but smaller in context size (8192)
|
32 |
|
33 |
+
In short, palmer is now half the size, twice the speed and same overall performance with a notable improvement on mmlu and arc challenge instead of winogrande.
|
34 |
|
35 |
As all palmer models, the model is biased to respond to answers without using any specific prompt, feel free to further fine-tune it for your specific use case.
|
36 |
|