2023-10-25 18:28:08,806 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Model: "SequenceTagger( (embeddings): TransformerWordEmbeddings( (model): BertModel( (embeddings): BertEmbeddings( (word_embeddings): Embedding(64001, 768) (position_embeddings): Embedding(512, 768) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (encoder): BertEncoder( (layer): ModuleList( (0-11): 12 x BertLayer( (attention): BertAttention( (self): BertSelfAttention( (query): Linear(in_features=768, out_features=768, bias=True) (key): Linear(in_features=768, out_features=768, bias=True) (value): Linear(in_features=768, out_features=768, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) (output): BertSelfOutput( (dense): Linear(in_features=768, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (intermediate): BertIntermediate( (dense): Linear(in_features=768, out_features=3072, bias=True) (intermediate_act_fn): GELUActivation() ) (output): BertOutput( (dense): Linear(in_features=3072, out_features=768, bias=True) (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (pooler): BertPooler( (dense): Linear(in_features=768, out_features=768, bias=True) (activation): Tanh() ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=768, out_features=17, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 MultiCorpus: 20847 train + 1123 dev + 3350 test sentences - NER_HIPE_2022 Corpus: 20847 train + 1123 dev + 3350 test sentences - /root/.flair/datasets/ner_hipe_2022/v2.1/newseye/de/with_doc_seperator 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Train: 20847 sentences 2023-10-25 18:28:08,807 (train_with_dev=False, train_with_test=False) 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Training Params: 2023-10-25 18:28:08,807 - learning_rate: "3e-05" 2023-10-25 18:28:08,807 - mini_batch_size: "8" 2023-10-25 18:28:08,807 - max_epochs: "10" 2023-10-25 18:28:08,807 - shuffle: "True" 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Plugins: 2023-10-25 18:28:08,807 - TensorboardLogger 2023-10-25 18:28:08,807 - LinearScheduler | warmup_fraction: '0.1' 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Final evaluation on model from best epoch (best-model.pt) 2023-10-25 18:28:08,807 - metric: "('micro avg', 'f1-score')" 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Computation: 2023-10-25 18:28:08,807 - compute on device: cuda:0 2023-10-25 18:28:08,807 - embedding storage: none 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Model training base path: "hmbench-newseye/de-dbmdz/bert-base-historic-multilingual-64k-td-cased-bs8-wsFalse-e10-lr3e-05-poolingfirst-layers-1-crfFalse-5" 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:28:08,807 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-25 18:28:22,840 epoch 1 - iter 260/2606 - loss 1.44118564 - time (sec): 14.03 - samples/sec: 2575.65 - lr: 0.000003 - momentum: 0.000000 2023-10-25 18:28:36,666 epoch 1 - iter 520/2606 - loss 0.92810801 - time (sec): 27.86 - samples/sec: 2637.50 - lr: 0.000006 - momentum: 0.000000 2023-10-25 18:28:51,042 epoch 1 - iter 780/2606 - loss 0.70417279 - time (sec): 42.23 - samples/sec: 2644.02 - lr: 0.000009 - momentum: 0.000000 2023-10-25 18:29:05,298 epoch 1 - iter 1040/2606 - loss 0.59393290 - time (sec): 56.49 - samples/sec: 2629.87 - lr: 0.000012 - momentum: 0.000000 2023-10-25 18:29:18,805 epoch 1 - iter 1300/2606 - loss 0.52883614 - time (sec): 70.00 - samples/sec: 2619.81 - lr: 0.000015 - momentum: 0.000000 2023-10-25 18:29:32,598 epoch 1 - iter 1560/2606 - loss 0.48018899 - time (sec): 83.79 - samples/sec: 2605.48 - lr: 0.000018 - momentum: 0.000000 2023-10-25 18:29:47,075 epoch 1 - iter 1820/2606 - loss 0.43654950 - time (sec): 98.27 - samples/sec: 2624.14 - lr: 0.000021 - momentum: 0.000000 2023-10-25 18:30:01,512 epoch 1 - iter 2080/2606 - loss 0.40049267 - time (sec): 112.70 - samples/sec: 2624.17 - lr: 0.000024 - momentum: 0.000000 2023-10-25 18:30:15,541 epoch 1 - iter 2340/2606 - loss 0.37673764 - time (sec): 126.73 - samples/sec: 2615.94 - lr: 0.000027 - momentum: 0.000000 2023-10-25 18:30:29,155 epoch 1 - iter 2600/2606 - loss 0.35942056 - time (sec): 140.35 - samples/sec: 2609.76 - lr: 0.000030 - momentum: 0.000000 2023-10-25 18:30:29,546 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:30:29,547 EPOCH 1 done: loss 0.3587 - lr: 0.000030 2023-10-25 18:30:33,264 DEV : loss 0.1093616634607315 - f1-score (micro avg) 0.2831 2023-10-25 18:30:33,289 saving best model 2023-10-25 18:30:33,764 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:30:47,858 epoch 2 - iter 260/2606 - loss 0.15973395 - time (sec): 14.09 - samples/sec: 2596.30 - lr: 0.000030 - momentum: 0.000000 2023-10-25 18:31:01,897 epoch 2 - iter 520/2606 - loss 0.15273016 - time (sec): 28.13 - samples/sec: 2587.41 - lr: 0.000029 - momentum: 0.000000 2023-10-25 18:31:16,107 epoch 2 - iter 780/2606 - loss 0.15091510 - time (sec): 42.34 - samples/sec: 2602.93 - lr: 0.000029 - momentum: 0.000000 2023-10-25 18:31:29,906 epoch 2 - iter 1040/2606 - loss 0.14924346 - time (sec): 56.14 - samples/sec: 2625.36 - lr: 0.000029 - momentum: 0.000000 2023-10-25 18:31:43,744 epoch 2 - iter 1300/2606 - loss 0.14800277 - time (sec): 69.98 - samples/sec: 2613.86 - lr: 0.000028 - momentum: 0.000000 2023-10-25 18:31:57,687 epoch 2 - iter 1560/2606 - loss 0.14609768 - time (sec): 83.92 - samples/sec: 2639.91 - lr: 0.000028 - momentum: 0.000000 2023-10-25 18:32:11,802 epoch 2 - iter 1820/2606 - loss 0.14641189 - time (sec): 98.04 - samples/sec: 2628.39 - lr: 0.000028 - momentum: 0.000000 2023-10-25 18:32:25,136 epoch 2 - iter 2080/2606 - loss 0.14722873 - time (sec): 111.37 - samples/sec: 2618.82 - lr: 0.000027 - momentum: 0.000000 2023-10-25 18:32:39,368 epoch 2 - iter 2340/2606 - loss 0.14675373 - time (sec): 125.60 - samples/sec: 2616.38 - lr: 0.000027 - momentum: 0.000000 2023-10-25 18:32:53,254 epoch 2 - iter 2600/2606 - loss 0.14529176 - time (sec): 139.49 - samples/sec: 2626.49 - lr: 0.000027 - momentum: 0.000000 2023-10-25 18:32:53,635 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:32:53,635 EPOCH 2 done: loss 0.1449 - lr: 0.000027 2023-10-25 18:33:00,495 DEV : loss 0.12418132275342941 - f1-score (micro avg) 0.345 2023-10-25 18:33:00,522 saving best model 2023-10-25 18:33:01,001 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:33:14,815 epoch 3 - iter 260/2606 - loss 0.12123734 - time (sec): 13.81 - samples/sec: 2636.88 - lr: 0.000026 - momentum: 0.000000 2023-10-25 18:33:29,151 epoch 3 - iter 520/2606 - loss 0.10035032 - time (sec): 28.15 - samples/sec: 2699.66 - lr: 0.000026 - momentum: 0.000000 2023-10-25 18:33:43,128 epoch 3 - iter 780/2606 - loss 0.09873947 - time (sec): 42.12 - samples/sec: 2677.14 - lr: 0.000026 - momentum: 0.000000 2023-10-25 18:33:56,862 epoch 3 - iter 1040/2606 - loss 0.09621422 - time (sec): 55.86 - samples/sec: 2651.72 - lr: 0.000025 - momentum: 0.000000 2023-10-25 18:34:10,890 epoch 3 - iter 1300/2606 - loss 0.09492393 - time (sec): 69.89 - samples/sec: 2650.00 - lr: 0.000025 - momentum: 0.000000 2023-10-25 18:34:24,483 epoch 3 - iter 1560/2606 - loss 0.09360765 - time (sec): 83.48 - samples/sec: 2628.90 - lr: 0.000025 - momentum: 0.000000 2023-10-25 18:34:38,353 epoch 3 - iter 1820/2606 - loss 0.09935380 - time (sec): 97.35 - samples/sec: 2625.24 - lr: 0.000024 - momentum: 0.000000 2023-10-25 18:34:52,835 epoch 3 - iter 2080/2606 - loss 0.09836373 - time (sec): 111.83 - samples/sec: 2637.51 - lr: 0.000024 - momentum: 0.000000 2023-10-25 18:35:06,260 epoch 3 - iter 2340/2606 - loss 0.09861426 - time (sec): 125.26 - samples/sec: 2624.73 - lr: 0.000024 - momentum: 0.000000 2023-10-25 18:35:20,465 epoch 3 - iter 2600/2606 - loss 0.09836569 - time (sec): 139.46 - samples/sec: 2629.47 - lr: 0.000023 - momentum: 0.000000 2023-10-25 18:35:20,763 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:35:20,763 EPOCH 3 done: loss 0.0983 - lr: 0.000023 2023-10-25 18:35:27,694 DEV : loss 0.3102573752403259 - f1-score (micro avg) 0.3591 2023-10-25 18:35:27,720 saving best model 2023-10-25 18:35:28,194 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:35:42,333 epoch 4 - iter 260/2606 - loss 0.05744413 - time (sec): 14.14 - samples/sec: 2584.08 - lr: 0.000023 - momentum: 0.000000 2023-10-25 18:35:57,025 epoch 4 - iter 520/2606 - loss 0.06369754 - time (sec): 28.83 - samples/sec: 2660.47 - lr: 0.000023 - momentum: 0.000000 2023-10-25 18:36:11,068 epoch 4 - iter 780/2606 - loss 0.06347063 - time (sec): 42.87 - samples/sec: 2642.25 - lr: 0.000022 - momentum: 0.000000 2023-10-25 18:36:25,141 epoch 4 - iter 1040/2606 - loss 0.06530696 - time (sec): 56.94 - samples/sec: 2663.60 - lr: 0.000022 - momentum: 0.000000 2023-10-25 18:36:38,908 epoch 4 - iter 1300/2606 - loss 0.06387663 - time (sec): 70.71 - samples/sec: 2682.17 - lr: 0.000022 - momentum: 0.000000 2023-10-25 18:36:52,388 epoch 4 - iter 1560/2606 - loss 0.06503052 - time (sec): 84.19 - samples/sec: 2665.31 - lr: 0.000021 - momentum: 0.000000 2023-10-25 18:37:06,164 epoch 4 - iter 1820/2606 - loss 0.06576948 - time (sec): 97.97 - samples/sec: 2673.09 - lr: 0.000021 - momentum: 0.000000 2023-10-25 18:37:20,037 epoch 4 - iter 2080/2606 - loss 0.06561542 - time (sec): 111.84 - samples/sec: 2667.10 - lr: 0.000021 - momentum: 0.000000 2023-10-25 18:37:33,397 epoch 4 - iter 2340/2606 - loss 0.06595358 - time (sec): 125.20 - samples/sec: 2660.84 - lr: 0.000020 - momentum: 0.000000 2023-10-25 18:37:46,603 epoch 4 - iter 2600/2606 - loss 0.06624139 - time (sec): 138.41 - samples/sec: 2650.34 - lr: 0.000020 - momentum: 0.000000 2023-10-25 18:37:46,868 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:37:46,868 EPOCH 4 done: loss 0.0662 - lr: 0.000020 2023-10-25 18:37:53,780 DEV : loss 0.27316081523895264 - f1-score (micro avg) 0.3408 2023-10-25 18:37:53,807 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:38:07,884 epoch 5 - iter 260/2606 - loss 0.05585596 - time (sec): 14.08 - samples/sec: 2570.69 - lr: 0.000020 - momentum: 0.000000 2023-10-25 18:38:21,643 epoch 5 - iter 520/2606 - loss 0.05209107 - time (sec): 27.84 - samples/sec: 2558.79 - lr: 0.000019 - momentum: 0.000000 2023-10-25 18:38:35,682 epoch 5 - iter 780/2606 - loss 0.04718666 - time (sec): 41.87 - samples/sec: 2606.01 - lr: 0.000019 - momentum: 0.000000 2023-10-25 18:38:49,695 epoch 5 - iter 1040/2606 - loss 0.04694978 - time (sec): 55.89 - samples/sec: 2636.11 - lr: 0.000019 - momentum: 0.000000 2023-10-25 18:39:03,937 epoch 5 - iter 1300/2606 - loss 0.04779807 - time (sec): 70.13 - samples/sec: 2627.53 - lr: 0.000018 - momentum: 0.000000 2023-10-25 18:39:17,799 epoch 5 - iter 1560/2606 - loss 0.04766551 - time (sec): 83.99 - samples/sec: 2633.45 - lr: 0.000018 - momentum: 0.000000 2023-10-25 18:39:31,276 epoch 5 - iter 1820/2606 - loss 0.04834322 - time (sec): 97.47 - samples/sec: 2613.56 - lr: 0.000018 - momentum: 0.000000 2023-10-25 18:39:45,187 epoch 5 - iter 2080/2606 - loss 0.04845369 - time (sec): 111.38 - samples/sec: 2622.77 - lr: 0.000017 - momentum: 0.000000 2023-10-25 18:39:58,437 epoch 5 - iter 2340/2606 - loss 0.04889454 - time (sec): 124.63 - samples/sec: 2636.67 - lr: 0.000017 - momentum: 0.000000 2023-10-25 18:40:12,471 epoch 5 - iter 2600/2606 - loss 0.04779773 - time (sec): 138.66 - samples/sec: 2643.59 - lr: 0.000017 - momentum: 0.000000 2023-10-25 18:40:12,773 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:40:12,773 EPOCH 5 done: loss 0.0478 - lr: 0.000017 2023-10-25 18:40:19,656 DEV : loss 0.3309582471847534 - f1-score (micro avg) 0.3394 2023-10-25 18:40:19,682 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:40:33,761 epoch 6 - iter 260/2606 - loss 0.03565927 - time (sec): 14.08 - samples/sec: 2701.06 - lr: 0.000016 - momentum: 0.000000 2023-10-25 18:40:47,631 epoch 6 - iter 520/2606 - loss 0.03669726 - time (sec): 27.95 - samples/sec: 2615.10 - lr: 0.000016 - momentum: 0.000000 2023-10-25 18:41:01,384 epoch 6 - iter 780/2606 - loss 0.03458977 - time (sec): 41.70 - samples/sec: 2601.88 - lr: 0.000016 - momentum: 0.000000 2023-10-25 18:41:15,444 epoch 6 - iter 1040/2606 - loss 0.03370625 - time (sec): 55.76 - samples/sec: 2630.87 - lr: 0.000015 - momentum: 0.000000 2023-10-25 18:41:29,289 epoch 6 - iter 1300/2606 - loss 0.03659547 - time (sec): 69.61 - samples/sec: 2615.42 - lr: 0.000015 - momentum: 0.000000 2023-10-25 18:41:43,555 epoch 6 - iter 1560/2606 - loss 0.03566172 - time (sec): 83.87 - samples/sec: 2625.18 - lr: 0.000015 - momentum: 0.000000 2023-10-25 18:41:56,926 epoch 6 - iter 1820/2606 - loss 0.03750852 - time (sec): 97.24 - samples/sec: 2619.62 - lr: 0.000014 - momentum: 0.000000 2023-10-25 18:42:10,835 epoch 6 - iter 2080/2606 - loss 0.03612369 - time (sec): 111.15 - samples/sec: 2623.28 - lr: 0.000014 - momentum: 0.000000 2023-10-25 18:42:25,243 epoch 6 - iter 2340/2606 - loss 0.03526350 - time (sec): 125.56 - samples/sec: 2621.48 - lr: 0.000014 - momentum: 0.000000 2023-10-25 18:42:39,920 epoch 6 - iter 2600/2606 - loss 0.03565724 - time (sec): 140.24 - samples/sec: 2613.12 - lr: 0.000013 - momentum: 0.000000 2023-10-25 18:42:40,268 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:42:40,268 EPOCH 6 done: loss 0.0357 - lr: 0.000013 2023-10-25 18:42:46,462 DEV : loss 0.3563600778579712 - f1-score (micro avg) 0.3609 2023-10-25 18:42:46,488 saving best model 2023-10-25 18:42:47,091 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:43:01,182 epoch 7 - iter 260/2606 - loss 0.03348942 - time (sec): 14.09 - samples/sec: 2564.82 - lr: 0.000013 - momentum: 0.000000 2023-10-25 18:43:16,395 epoch 7 - iter 520/2606 - loss 0.03261560 - time (sec): 29.30 - samples/sec: 2522.13 - lr: 0.000013 - momentum: 0.000000 2023-10-25 18:43:30,249 epoch 7 - iter 780/2606 - loss 0.03131634 - time (sec): 43.16 - samples/sec: 2570.67 - lr: 0.000012 - momentum: 0.000000 2023-10-25 18:43:44,668 epoch 7 - iter 1040/2606 - loss 0.02913817 - time (sec): 57.57 - samples/sec: 2596.73 - lr: 0.000012 - momentum: 0.000000 2023-10-25 18:43:58,255 epoch 7 - iter 1300/2606 - loss 0.02865369 - time (sec): 71.16 - samples/sec: 2606.97 - lr: 0.000012 - momentum: 0.000000 2023-10-25 18:44:12,566 epoch 7 - iter 1560/2606 - loss 0.03000096 - time (sec): 85.47 - samples/sec: 2621.12 - lr: 0.000011 - momentum: 0.000000 2023-10-25 18:44:26,829 epoch 7 - iter 1820/2606 - loss 0.02935599 - time (sec): 99.74 - samples/sec: 2624.84 - lr: 0.000011 - momentum: 0.000000 2023-10-25 18:44:41,001 epoch 7 - iter 2080/2606 - loss 0.02816366 - time (sec): 113.91 - samples/sec: 2619.08 - lr: 0.000011 - momentum: 0.000000 2023-10-25 18:44:54,742 epoch 7 - iter 2340/2606 - loss 0.02786870 - time (sec): 127.65 - samples/sec: 2602.86 - lr: 0.000010 - momentum: 0.000000 2023-10-25 18:45:08,369 epoch 7 - iter 2600/2606 - loss 0.02810153 - time (sec): 141.28 - samples/sec: 2595.47 - lr: 0.000010 - momentum: 0.000000 2023-10-25 18:45:08,655 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:45:08,655 EPOCH 7 done: loss 0.0281 - lr: 0.000010 2023-10-25 18:45:14,892 DEV : loss 0.35052964091300964 - f1-score (micro avg) 0.4187 2023-10-25 18:45:14,917 saving best model 2023-10-25 18:45:15,604 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:45:29,274 epoch 8 - iter 260/2606 - loss 0.01679626 - time (sec): 13.67 - samples/sec: 2649.74 - lr: 0.000010 - momentum: 0.000000 2023-10-25 18:45:43,015 epoch 8 - iter 520/2606 - loss 0.01590055 - time (sec): 27.41 - samples/sec: 2630.44 - lr: 0.000009 - momentum: 0.000000 2023-10-25 18:45:56,822 epoch 8 - iter 780/2606 - loss 0.01681384 - time (sec): 41.22 - samples/sec: 2642.29 - lr: 0.000009 - momentum: 0.000000 2023-10-25 18:46:10,352 epoch 8 - iter 1040/2606 - loss 0.01711388 - time (sec): 54.75 - samples/sec: 2613.36 - lr: 0.000009 - momentum: 0.000000 2023-10-25 18:46:24,915 epoch 8 - iter 1300/2606 - loss 0.01693945 - time (sec): 69.31 - samples/sec: 2609.75 - lr: 0.000008 - momentum: 0.000000 2023-10-25 18:46:38,629 epoch 8 - iter 1560/2606 - loss 0.01717473 - time (sec): 83.02 - samples/sec: 2589.08 - lr: 0.000008 - momentum: 0.000000 2023-10-25 18:46:52,741 epoch 8 - iter 1820/2606 - loss 0.01849902 - time (sec): 97.14 - samples/sec: 2602.39 - lr: 0.000008 - momentum: 0.000000 2023-10-25 18:47:06,886 epoch 8 - iter 2080/2606 - loss 0.01830031 - time (sec): 111.28 - samples/sec: 2604.13 - lr: 0.000007 - momentum: 0.000000 2023-10-25 18:47:21,244 epoch 8 - iter 2340/2606 - loss 0.01818663 - time (sec): 125.64 - samples/sec: 2619.72 - lr: 0.000007 - momentum: 0.000000 2023-10-25 18:47:34,988 epoch 8 - iter 2600/2606 - loss 0.01797792 - time (sec): 139.38 - samples/sec: 2629.60 - lr: 0.000007 - momentum: 0.000000 2023-10-25 18:47:35,287 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:47:35,288 EPOCH 8 done: loss 0.0180 - lr: 0.000007 2023-10-25 18:47:41,544 DEV : loss 0.4683326184749603 - f1-score (micro avg) 0.3644 2023-10-25 18:47:41,569 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:47:55,554 epoch 9 - iter 260/2606 - loss 0.01570520 - time (sec): 13.98 - samples/sec: 2653.48 - lr: 0.000006 - momentum: 0.000000 2023-10-25 18:48:09,087 epoch 9 - iter 520/2606 - loss 0.01431690 - time (sec): 27.52 - samples/sec: 2602.52 - lr: 0.000006 - momentum: 0.000000 2023-10-25 18:48:22,993 epoch 9 - iter 780/2606 - loss 0.01274596 - time (sec): 41.42 - samples/sec: 2647.47 - lr: 0.000006 - momentum: 0.000000 2023-10-25 18:48:36,937 epoch 9 - iter 1040/2606 - loss 0.01326027 - time (sec): 55.37 - samples/sec: 2628.55 - lr: 0.000005 - momentum: 0.000000 2023-10-25 18:48:50,570 epoch 9 - iter 1300/2606 - loss 0.01338527 - time (sec): 69.00 - samples/sec: 2615.07 - lr: 0.000005 - momentum: 0.000000 2023-10-25 18:49:04,248 epoch 9 - iter 1560/2606 - loss 0.01381768 - time (sec): 82.68 - samples/sec: 2627.59 - lr: 0.000005 - momentum: 0.000000 2023-10-25 18:49:17,867 epoch 9 - iter 1820/2606 - loss 0.01373566 - time (sec): 96.30 - samples/sec: 2617.12 - lr: 0.000004 - momentum: 0.000000 2023-10-25 18:49:32,820 epoch 9 - iter 2080/2606 - loss 0.01338600 - time (sec): 111.25 - samples/sec: 2617.51 - lr: 0.000004 - momentum: 0.000000 2023-10-25 18:49:46,961 epoch 9 - iter 2340/2606 - loss 0.01388701 - time (sec): 125.39 - samples/sec: 2638.49 - lr: 0.000004 - momentum: 0.000000 2023-10-25 18:50:00,960 epoch 9 - iter 2600/2606 - loss 0.01382361 - time (sec): 139.39 - samples/sec: 2626.74 - lr: 0.000003 - momentum: 0.000000 2023-10-25 18:50:01,379 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:50:01,379 EPOCH 9 done: loss 0.0138 - lr: 0.000003 2023-10-25 18:50:07,639 DEV : loss 0.4295385181903839 - f1-score (micro avg) 0.3935 2023-10-25 18:50:07,666 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:50:21,916 epoch 10 - iter 260/2606 - loss 0.00628169 - time (sec): 14.25 - samples/sec: 2601.73 - lr: 0.000003 - momentum: 0.000000 2023-10-25 18:50:37,396 epoch 10 - iter 520/2606 - loss 0.00790052 - time (sec): 29.73 - samples/sec: 2463.62 - lr: 0.000003 - momentum: 0.000000 2023-10-25 18:50:50,955 epoch 10 - iter 780/2606 - loss 0.00841466 - time (sec): 43.29 - samples/sec: 2529.08 - lr: 0.000002 - momentum: 0.000000 2023-10-25 18:51:04,859 epoch 10 - iter 1040/2606 - loss 0.00940638 - time (sec): 57.19 - samples/sec: 2537.96 - lr: 0.000002 - momentum: 0.000000 2023-10-25 18:51:18,666 epoch 10 - iter 1300/2606 - loss 0.00936794 - time (sec): 71.00 - samples/sec: 2553.71 - lr: 0.000002 - momentum: 0.000000 2023-10-25 18:51:32,144 epoch 10 - iter 1560/2606 - loss 0.00994615 - time (sec): 84.48 - samples/sec: 2543.68 - lr: 0.000001 - momentum: 0.000000 2023-10-25 18:51:45,883 epoch 10 - iter 1820/2606 - loss 0.00957951 - time (sec): 98.22 - samples/sec: 2559.69 - lr: 0.000001 - momentum: 0.000000 2023-10-25 18:51:59,867 epoch 10 - iter 2080/2606 - loss 0.00938588 - time (sec): 112.20 - samples/sec: 2564.23 - lr: 0.000001 - momentum: 0.000000 2023-10-25 18:52:14,080 epoch 10 - iter 2340/2606 - loss 0.00938487 - time (sec): 126.41 - samples/sec: 2593.01 - lr: 0.000000 - momentum: 0.000000 2023-10-25 18:52:28,218 epoch 10 - iter 2600/2606 - loss 0.00945105 - time (sec): 140.55 - samples/sec: 2604.22 - lr: 0.000000 - momentum: 0.000000 2023-10-25 18:52:28,653 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:52:28,653 EPOCH 10 done: loss 0.0094 - lr: 0.000000 2023-10-25 18:52:35,606 DEV : loss 0.47907230257987976 - f1-score (micro avg) 0.3898 2023-10-25 18:52:36,103 ---------------------------------------------------------------------------------------------------- 2023-10-25 18:52:36,104 Loading model from best epoch ... 2023-10-25 18:52:37,712 SequenceTagger predicts: Dictionary with 17 tags: O, S-LOC, B-LOC, E-LOC, I-LOC, S-PER, B-PER, E-PER, I-PER, S-ORG, B-ORG, E-ORG, I-ORG, S-HumanProd, B-HumanProd, E-HumanProd, I-HumanProd 2023-10-25 18:52:47,559 Results: - F-score (micro) 0.4885 - F-score (macro) 0.3364 - Accuracy 0.3266 By class: precision recall f1-score support LOC 0.5243 0.5964 0.5580 1214 PER 0.4407 0.4505 0.4455 808 ORG 0.3605 0.3258 0.3423 353 HumanProd 0.0000 0.0000 0.0000 15 micro avg 0.4746 0.5033 0.4885 2390 macro avg 0.3314 0.3432 0.3364 2390 weighted avg 0.4685 0.5033 0.4846 2390 2023-10-25 18:52:47,560 ----------------------------------------------------------------------------------------------------