mtasic85 commited on
Commit
425beac
·
1 Parent(s): e7cbf52

updated readme

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ language: [
6
+ 'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
7
+ 'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
8
+ 'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
9
+ 'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
10
+ 'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
11
+ 'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
12
+ 'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
13
+ ]
14
+ datasets: [
15
+ '',
16
+ '',
17
+ '',
18
+ '',
19
+ '',
20
+ '',
21
+ ]
22
+ tags:
23
+ - litgpt
24
+ - litdata
25
  ---
26
+
27
+ # tangled-llama-e-128k-v0.1
28
+
29
+ ![logo](./misc/logo.png)
30
+
31
+ A pretrained language model based on the Llama model with about **134.2M** parameters. This model has been trained on **9.9B** (`9,889,496,064`) tokens from more than **???** (`???`) dataset rows.
32
+
33
+ This model **isn't** designed for immediate use but rather for Continued Pretraining and Finetuning on a downstream task. While it can handle a context length of up to **128K** (`131,072`) tokens, it was pretrained with sequences of **512** (`512`) tokens.
34
+
35
+ The objective is to streamline the cognitive or reasoning core, eliminating any redundant knowledge from the model.
36
+
37
+ [loss, val_loss](https://api.wandb.ai/links/mtasic85/strnx9rl)
38
+
39
+ [val_ppl](https://api.wandb.ai/links/mtasic85/ljwxf4am)
40
+
41
+ [epoch](https://api.wandb.ai/links/mtasic85/edyph869)
42
+
43
+ [learning_rate](https://api.wandb.ai/links/mtasic85/eswxyger)
44
+
45
+ ## Pretrain
46
+
47
+ 134,234,368 params
48
+ 653.11 TFLOPS on 1x RTX 3090 24GB
49
+
50
+ ```
51
+ Epoch 3 | iter 1755912 step 38172 | loss train: 2.350, val: 2.473 | iter time: 779.54 ms (step) remaining time: 0:00:08
52
+ Final evaluation | val loss: 2.471 | val ppl: 11.837
53
+
54
+ ----------------------------------------
55
+ | Performance
56
+ | - Total tokens : 9,889,493,504
57
+ | - Training Time : 448691.01 s
58
+ | - Tok/sec : 5162.13 tok/s
59
+ | ----------------------------------------
60
+ | Memory Usage
61
+ | - Memory Used : 23.47 GB
62
+ ----------------------------------------
63
+ ```
64
+
65
+ ## Pretrain Evaluation
66
+
67
+ ### lm-evaluation-harness
68
+
69
+ ```bash
70
+ litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-quick/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
71
+ ```
72
+
73
+ ```bash
74
+ litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-leaderboard/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
75
+ ```
76
+
77
+ ```bash
78
+ litgpt evaluate --tasks 'gsm8k,mathqa' --out_dir 'evaluate-math/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
79
+ ```
80
+
81
+ ```bash
82
+ litgpt evaluate --tasks 'mmlu,mmlu_pro' --out_dir 'evaluate-mmlu/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
83
+ ```
84
+
85
+ ```bash
86
+ litgpt evaluate --tasks 'arc_challenge,boolq,gpqa,hellaswag,openbookqa,piqa,truthfulqa_mc2,winogrande' --out_dir 'evaluate-reasoning/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
87
+ ```
88
+
89
+ ```bash
90
+ litgpt evaluate --tasks 'wikitext,qasper' --out_dir 'evaluate-long/' --batch_size 4 --dtype 'bfloat16' out/pretrain/final/
91
+ ```