mega-ar-525m-v0.07-ultraTBfw
pretraining experiment:
- 525m params: 1536 hidden size, 3x hidden:FF, 8 layers
- context length 4096, MEGA EMA dim 32
- llama-3 tokenizer
Model description
This model is a fine-tuned version of pszemraj/mega-ar-525m-v0.06-fw_longish on the BEE-spoke-data/UltraTextbooks-2.1-fw_mix dataset. It achieves the following results on the evaluation set:
- Loss: 1.9824
- Accuracy: 0.5874
Quick eval
Quick eval for: pszemraj/mega-ar-525m-v0.07-ultraTBfw
hf (pretrained=pszemraj/mega-ar-525m-v0.07-ultraTBfw,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
arc_easy | 1 | none | 0 | acc | 0.4912 | ± | 0.0103 |
none | 0 | acc_norm | 0.4356 | ± | 0.0102 | ||
boolq | 2 | none | 0 | acc | 0.6092 | ± | 0.0085 |
lambada_openai | 1 | none | 0 | perplexity | 49.3787 | ± | 2.0179 |
none | 0 | acc | 0.3078 | ± | 0.0064 | ||
openbookqa | 1 | none | 0 | acc | 0.1900 | ± | 0.0176 |
none | 0 | acc_norm | 0.3060 | ± | 0.0206 | ||
piqa | 1 | none | 0 | acc | 0.6480 | ± | 0.0111 |
none | 0 | acc_norm | 0.6480 | ± | 0.0111 | ||
winogrande | 1 | none | 0 | acc | 0.5209 | ± | 0.0140 |
- Downloads last month
- 2,733
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.