metadata
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers
language:
- en
- am
- ar
- as
- az
- be
- bg
- bn
- br
- bs
- ca
- cs
- cy
- da
- de
- el
- eo
- es
- et
- eu
- fa
- ff
- fi
- fr
- fy
- ga
- gd
- gl
- gn
- gu
- ha
- he
- hi
- hr
- ht
- hu
- hy
- id
- ig
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lg
- li
- ln
- lo
- lt
- lv
- mg
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- 'no'
- ns
- om
- or
- pa
- pl
- ps
- pt
- qu
- rm
- ro
- ru
- sa
- si
- sc
- sd
- sk
- sl
- so
- sq
- sr
- ss
- su
- sv
- sw
- ta
- te
- th
- tl
- tn
- tr
- ug
- uk
- ur
- uz
- vi
- wo
- xh
- yi
- yo
- zu
datasets: []
tags:
- litgpt
- litdata
tangled-llama-e-128k-v0.1
A pretrained language model based on the Llama model with about 134.2M parameters. This model has been trained on 9.9B (9,889,496,064
) tokens from more than ??? (???
) dataset rows.
This model isn't designed for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to 128K (131,072
) tokens, it was pretrained with sequences of 512 (512
) tokens.
The objective is to streamline the cognitive or reasoning core, eliminating any redundant knowledge from the model.
Pretrain
134,234,368 params 653.11 TFLOPS on 1x RTX 3090 24GB
Epoch 3 | iter 1755912 step 38172 | loss train: 2.350, val: 2.473 | iter time: 779.54 ms (step) remaining time: 0:00:08
Final evaluation | val loss: 2.471 | val ppl: 11.837
----------------------------------------
| Performance
| - Total tokens : 9,889,493,504
| - Training Time : 448691.01 s
| - Tok/sec : 5162.13 tok/s
| ----------------------------------------
| Memory Usage
| - Memory Used : 23.47 GB
----------------------------------------
Pretrain Evaluation
lm-evaluation-harness
litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-quick/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-leaderboard/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'gsm8k,mathqa' --out_dir 'evaluate-math/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'mmlu,mmlu_pro' --out_dir 'evaluate-mmlu/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'arc_challenge,boolq,gpqa,hellaswag,openbookqa,piqa,truthfulqa_mc2,winogrande' --out_dir 'evaluate-reasoning/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
litgpt evaluate --tasks 'wikitext,qasper' --out_dir 'evaluate-long/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/