tangled-0.4-0.5b-base

logo

time python -B prepare_core_datasets.py
# 4096 x 4000
Progress: 100%|████████| 235/235 [1:24:19<00:00, 21.53s/it]
Workers are finished.██| 235/235 [1:24:19<00:00, 21.53s/it]

# 32768 x 500
Progress: 100%|████████| 235/235 [1:28:33<00:00, 22.61s/it]
Workers are finished.██| 235/235 [1:28:33<00:00, 22.61s/it]

i=0, block_size=4096, chunk_size=16384000, len(dataset)=2997148, len(dataset) * block_size=12276318208
Total number of tokens in the optimized dataset '../core-data-0-4096-4000' is 12276318208

i=1, block_size=32768, chunk_size=16384000, len(dataset)=374022, len(dataset) * block_size=12255952896
Total number of tokens in the optimized dataset '../core-data-1-32768-500' is 12255952896

real    172m58.892s
user    1150m0.879s
sys     11m49.907s
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-core-model.yaml
Seed set to 23
Time to instantiate model: 0.23 seconds.
Total parameters: 199,459,328
Verifying settings ...
Measured TFLOPs: 7069.07
Epoch 1 | iter 256 step 1 | loss train: 12.109, val: n/a | iter time: 378.87 ms (step) remaining time: 7 days, 8:11:36
Epoch 1 | iter 512 step 2 | loss train: 12.106, val: n/a | iter time: 346.80 ms (step) remaining time: 6 days, 12:20:04
Epoch 1 | iter 768 step 3 | loss train: 12.104, val: n/a | iter time: 348.23 ms (step) remaining time: 6 days, 5:58:13
Epoch 1 | iter 1024 step 4 | loss train: 12.093, val: n/a | iter time: 347.90 ms (step) remaining time: 6 days, 3:14:51
Epoch 1 | iter 1280 step 5 | loss train: 12.090, val: n/a | iter time: 348.40 ms (step) remaining time: 6 days, 1:15:13
Epoch 1 | iter 1536 step 6 | loss train: 12.070, val: n/a | iter time: 349.50 ms (step) remaining time: 5 days, 23:54:58
Epoch 1 | iter 1792 step 7 | loss train: 12.032, val: n/a | iter time: 348.92 ms (step) remaining time: 5 days, 22:57:52
Epoch 1 | iter 2048 step 8 | loss train: 12.009, val: n/a | iter time: 347.99 ms (step) remaining time: 5 days, 22:14:45
Epoch 1 | iter 2304 step 9 | loss train: 11.979, val: n/a | iter time: 348.45 ms (step) remaining time: 5 days, 21:40:34
Epoch 1 | iter 2560 step 10 | loss train: 11.940, val: n/a | iter time: 348.64 ms (step) remaining time: 5 days, 21:13:17
Epoch 1 | iter 2816 step 11 | loss train: 11.862, val: n/a | iter time: 347.73 ms (step) remaining time: 5 days, 20:50:38
# ...

Backup wandb:

mv wandb wandb-pretrain-core

Chat with model:

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core/final
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core/leaderboard/' --batch_size 1 --dtype 'bfloat16' '../out/pretrain-core/final'
|                           Tasks                           |Version|Filter|n-shot|        Metric         |   |Value |   |Stderr|
|-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
|leaderboard                                                |    N/A|      |      |                       |   |      |   |      |
| - leaderboard_bbh                                         |    N/A|      |      |                       |   |      |   |      |
|  - leaderboard_bbh_boolean_expressions                    |      1|none  |     3|acc_norm               |↑  |0.4600|±  |0.0316|
|  - leaderboard_bbh_causal_judgement                       |      1|none  |     3|acc_norm               |↑  |0.5027|±  |0.0367|
|  - leaderboard_bbh_date_understanding                     |      1|none  |     3|acc_norm               |↑  |0.1960|±  |0.0252|
|  - leaderboard_bbh_disambiguation_qa                      |      1|none  |     3|acc_norm               |↑  |0.3800|±  |0.0308|
|  - leaderboard_bbh_formal_fallacies                       |      1|none  |     3|acc_norm               |↑  |0.4680|±  |0.0316|
|  - leaderboard_bbh_geometric_shapes                       |      1|none  |     3|acc_norm               |↑  |0.0920|±  |0.0183|
|  - leaderboard_bbh_hyperbaton                             |      1|none  |     3|acc_norm               |↑  |0.5160|±  |0.0317|
|  - leaderboard_bbh_logical_deduction_five_objects         |      1|none  |     3|acc_norm               |↑  |0.1960|±  |0.0252|
|  - leaderboard_bbh_logical_deduction_seven_objects        |      1|none  |     3|acc_norm               |↑  |0.1520|±  |0.0228|
|  - leaderboard_bbh_logical_deduction_three_objects        |      1|none  |     3|acc_norm               |↑  |0.3440|±  |0.0301|
|  - leaderboard_bbh_movie_recommendation                   |      1|none  |     3|acc_norm               |↑  |0.2200|±  |0.0263|
|  - leaderboard_bbh_navigate                               |      1|none  |     3|acc_norm               |↑  |0.4200|±  |0.0313|
|  - leaderboard_bbh_object_counting                        |      1|none  |     3|acc_norm               |↑  |0.0560|±  |0.0146|
|  - leaderboard_bbh_penguins_in_a_table                    |      1|none  |     3|acc_norm               |↑  |0.2055|±  |0.0336|
|  - leaderboard_bbh_reasoning_about_colored_objects        |      1|none  |     3|acc_norm               |↑  |0.1520|±  |0.0228|
|  - leaderboard_bbh_ruin_names                             |      1|none  |     3|acc_norm               |↑  |0.2320|±  |0.0268|
|  - leaderboard_bbh_salient_translation_error_detection    |      1|none  |     3|acc_norm               |↑  |0.2200|±  |0.0263|
|  - leaderboard_bbh_snarks                                 |      1|none  |     3|acc_norm               |↑  |0.5393|±  |0.0375|
|  - leaderboard_bbh_sports_understanding                   |      1|none  |     3|acc_norm               |↑  |0.4600|±  |0.0316|
|  - leaderboard_bbh_temporal_sequences                     |      1|none  |     3|acc_norm               |↑  |0.2000|±  |0.0253|
|  - leaderboard_bbh_tracking_shuffled_objects_five_objects |      1|none  |     3|acc_norm               |↑  |0.2160|±  |0.0261|
|  - leaderboard_bbh_tracking_shuffled_objects_seven_objects|      1|none  |     3|acc_norm               |↑  |0.1480|±  |0.0225|
|  - leaderboard_bbh_tracking_shuffled_objects_three_objects|      1|none  |     3|acc_norm               |↑  |0.3520|±  |0.0303|
|  - leaderboard_bbh_web_of_lies                            |      1|none  |     3|acc_norm               |↑  |0.4880|±  |0.0317|
| - leaderboard_gpqa                                        |    N/A|      |      |                       |   |      |   |      |
|  - leaderboard_gpqa_diamond                               |      1|none  |     0|acc_norm               |↑  |0.2778|±  |0.0319|
|  - leaderboard_gpqa_extended                              |      1|none  |     0|acc_norm               |↑  |0.2509|±  |0.0186|
|  - leaderboard_gpqa_main                                  |      1|none  |     0|acc_norm               |↑  |0.2545|±  |0.0206|
| - leaderboard_ifeval                                      |      3|none  |     0|inst_level_loose_acc   |↑  |0.2026|±  |   N/A|
|                                                           |       |none  |     0|inst_level_strict_acc  |↑  |0.1679|±  |   N/A|
|                                                           |       |none  |     0|prompt_level_loose_acc |↑  |0.0869|±  |0.0121|
|                                                           |       |none  |     0|prompt_level_strict_acc|↑  |0.0684|±  |0.0109|
| - leaderboard_math_hard                                   |    N/A|      |      |                       |   |      |   |      |
|  - leaderboard_math_algebra_hard                          |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_counting_and_prob_hard                |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_geometry_hard                         |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_intermediate_algebra_hard             |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_num_theory_hard                       |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_prealgebra_hard                       |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
|  - leaderboard_math_precalculus_hard                      |      2|none  |     4|exact_match            |↑  |0.0000|±  |     0|
| - leaderboard_mmlu_pro                                    |    0.1|none  |     5|acc                    |↑  |0.1123|±  |0.0029|
| - leaderboard_musr                                        |    N/A|      |      |                       |   |      |   |      |
|  - leaderboard_musr_murder_mysteries                      |      1|none  |     0|acc_norm               |↑  |0.5000|±  |0.0317|
|  - leaderboard_musr_object_placements                     |      1|none  |     0|acc_norm               |↑  |0.2188|±  |0.0259|
|  - leaderboard_musr_team_allocation                       |      1|none  |     0|acc_norm               |↑  |0.3000|±  |0.0290|

3840.17user 10.07system 1:04:53elapsed 98%CPU (0avgtext+0avgdata 5976236maxresident)k
96664inputs+3177472outputs (14major+2239507minor)pagefaults 0swaps
litgpt convert_pretrained_checkpoint ../out/pretrain-core/final ../out/pretrain-core-converted
litgpt convert_from_litgpt ../out/pretrain-core-converted/ ../out/pretrain-core-converted
cp ../evaluate/pretrain-core/leaderboard/pytorch_model.bin ../out/pretrain-core-converted
mergekit-yaml merge-core-into-base.yaml ../out/pretrain-base-converted --clone-tensors
litgpt convert_to_litgpt --model_name "Qwen2.5-0.5B" --dtype bfloat16 ../out/pretrain-base-converted/
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-base-converted/
cp -r ../out/pretrain-base-converted/ ../out/pretrain-base
rm ../out/pretrain-base/lit_model.pth ../out/pretrain-base/mergekit_config.yml ../out/pretrain-base/model_config.yaml
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Datasets used to train tangledgroup/tangled-0.4-0.5b-base