2023-10-06 13:28:59,709 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,710 Model: "SequenceTagger( (embeddings): ByT5Embeddings( (model): T5EncoderModel( (shared): Embedding(384, 1472) (encoder): T5Stack( (embed_tokens): Embedding(384, 1472) (block): ModuleList( (0): T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) (relative_attention_bias): Embedding(32, 6) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (1-11): 11 x T5Block( (layer): ModuleList( (0): T5LayerSelfAttention( (SelfAttention): T5Attention( (q): Linear(in_features=1472, out_features=384, bias=False) (k): Linear(in_features=1472, out_features=384, bias=False) (v): Linear(in_features=1472, out_features=384, bias=False) (o): Linear(in_features=384, out_features=1472, bias=False) ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) (1): T5LayerFF( (DenseReluDense): T5DenseGatedActDense( (wi_0): Linear(in_features=1472, out_features=3584, bias=False) (wi_1): Linear(in_features=1472, out_features=3584, bias=False) (wo): Linear(in_features=3584, out_features=1472, bias=False) (dropout): Dropout(p=0.1, inplace=False) (act): NewGELUActivation() ) (layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) ) (final_layer_norm): T5LayerNorm() (dropout): Dropout(p=0.1, inplace=False) ) ) ) (locked_dropout): LockedDropout(p=0.5) (linear): Linear(in_features=1472, out_features=25, bias=True) (loss_function): CrossEntropyLoss() )" 2023-10-06 13:28:59,710 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,710 MultiCorpus: 1214 train + 266 dev + 251 test sentences - NER_HIPE_2022 Corpus: 1214 train + 266 dev + 251 test sentences - /app/.flair/datasets/ner_hipe_2022/v2.1/ajmc/en/with_doc_seperator 2023-10-06 13:28:59,710 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,710 Train: 1214 sentences 2023-10-06 13:28:59,710 (train_with_dev=False, train_with_test=False) 2023-10-06 13:28:59,711 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,711 Training Params: 2023-10-06 13:28:59,711 - learning_rate: "0.00016" 2023-10-06 13:28:59,711 - mini_batch_size: "8" 2023-10-06 13:28:59,711 - max_epochs: "10" 2023-10-06 13:28:59,711 - shuffle: "True" 2023-10-06 13:28:59,711 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,711 Plugins: 2023-10-06 13:28:59,711 - TensorboardLogger 2023-10-06 13:28:59,711 - LinearScheduler | warmup_fraction: '0.1' 2023-10-06 13:28:59,711 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,711 Final evaluation on model from best epoch (best-model.pt) 2023-10-06 13:28:59,711 - metric: "('micro avg', 'f1-score')" 2023-10-06 13:28:59,711 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,711 Computation: 2023-10-06 13:28:59,711 - compute on device: cuda:0 2023-10-06 13:28:59,711 - embedding storage: none 2023-10-06 13:28:59,711 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,711 Model training base path: "hmbench-ajmc/en-hmbyt5-preliminary/byt5-small-historic-multilingual-span20-flax-bs8-wsFalse-e10-lr0.00016-poolingfirst-layers-1-crfFalse-3" 2023-10-06 13:28:59,712 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,712 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:28:59,712 Logging anything other than scalars to TensorBoard is currently not supported. 2023-10-06 13:29:10,494 epoch 1 - iter 15/152 - loss 3.21425648 - time (sec): 10.78 - samples/sec: 273.72 - lr: 0.000015 - momentum: 0.000000 2023-10-06 13:29:21,501 epoch 1 - iter 30/152 - loss 3.20484590 - time (sec): 21.79 - samples/sec: 274.23 - lr: 0.000031 - momentum: 0.000000 2023-10-06 13:29:32,799 epoch 1 - iter 45/152 - loss 3.19395315 - time (sec): 33.09 - samples/sec: 274.10 - lr: 0.000046 - momentum: 0.000000 2023-10-06 13:29:43,694 epoch 1 - iter 60/152 - loss 3.17283582 - time (sec): 43.98 - samples/sec: 271.07 - lr: 0.000062 - momentum: 0.000000 2023-10-06 13:29:54,785 epoch 1 - iter 75/152 - loss 3.12967735 - time (sec): 55.07 - samples/sec: 271.70 - lr: 0.000078 - momentum: 0.000000 2023-10-06 13:30:05,843 epoch 1 - iter 90/152 - loss 3.06072708 - time (sec): 66.13 - samples/sec: 271.55 - lr: 0.000094 - momentum: 0.000000 2023-10-06 13:30:16,897 epoch 1 - iter 105/152 - loss 2.97604040 - time (sec): 77.18 - samples/sec: 272.45 - lr: 0.000109 - momentum: 0.000000 2023-10-06 13:30:28,567 epoch 1 - iter 120/152 - loss 2.87593312 - time (sec): 88.85 - samples/sec: 274.51 - lr: 0.000125 - momentum: 0.000000 2023-10-06 13:30:39,687 epoch 1 - iter 135/152 - loss 2.77817131 - time (sec): 99.97 - samples/sec: 274.68 - lr: 0.000141 - momentum: 0.000000 2023-10-06 13:30:50,828 epoch 1 - iter 150/152 - loss 2.67263821 - time (sec): 111.11 - samples/sec: 275.63 - lr: 0.000157 - momentum: 0.000000 2023-10-06 13:30:52,101 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:30:52,101 EPOCH 1 done: loss 2.6606 - lr: 0.000157 2023-10-06 13:30:59,775 DEV : loss 1.5522795915603638 - f1-score (micro avg) 0.0 2023-10-06 13:30:59,783 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:31:10,634 epoch 2 - iter 15/152 - loss 1.49398668 - time (sec): 10.85 - samples/sec: 290.89 - lr: 0.000158 - momentum: 0.000000 2023-10-06 13:31:21,375 epoch 2 - iter 30/152 - loss 1.35452531 - time (sec): 21.59 - samples/sec: 293.09 - lr: 0.000157 - momentum: 0.000000 2023-10-06 13:31:31,331 epoch 2 - iter 45/152 - loss 1.22766177 - time (sec): 31.55 - samples/sec: 286.33 - lr: 0.000155 - momentum: 0.000000 2023-10-06 13:31:41,784 epoch 2 - iter 60/152 - loss 1.14917370 - time (sec): 42.00 - samples/sec: 287.58 - lr: 0.000153 - momentum: 0.000000 2023-10-06 13:31:52,201 epoch 2 - iter 75/152 - loss 1.08950954 - time (sec): 52.42 - samples/sec: 288.42 - lr: 0.000151 - momentum: 0.000000 2023-10-06 13:32:02,548 epoch 2 - iter 90/152 - loss 1.03701634 - time (sec): 62.76 - samples/sec: 289.12 - lr: 0.000150 - momentum: 0.000000 2023-10-06 13:32:12,695 epoch 2 - iter 105/152 - loss 0.97540061 - time (sec): 72.91 - samples/sec: 290.16 - lr: 0.000148 - momentum: 0.000000 2023-10-06 13:32:22,848 epoch 2 - iter 120/152 - loss 0.93328702 - time (sec): 83.06 - samples/sec: 291.25 - lr: 0.000146 - momentum: 0.000000 2023-10-06 13:32:33,591 epoch 2 - iter 135/152 - loss 0.88336313 - time (sec): 93.81 - samples/sec: 292.38 - lr: 0.000144 - momentum: 0.000000 2023-10-06 13:32:44,448 epoch 2 - iter 150/152 - loss 0.84269462 - time (sec): 104.66 - samples/sec: 293.32 - lr: 0.000143 - momentum: 0.000000 2023-10-06 13:32:45,531 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:32:45,531 EPOCH 2 done: loss 0.8394 - lr: 0.000143 2023-10-06 13:32:52,495 DEV : loss 0.5257992148399353 - f1-score (micro avg) 0.0 2023-10-06 13:32:52,502 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:33:02,896 epoch 3 - iter 15/152 - loss 0.50916543 - time (sec): 10.39 - samples/sec: 281.37 - lr: 0.000141 - momentum: 0.000000 2023-10-06 13:33:12,817 epoch 3 - iter 30/152 - loss 0.46396694 - time (sec): 20.31 - samples/sec: 284.84 - lr: 0.000139 - momentum: 0.000000 2023-10-06 13:33:23,331 epoch 3 - iter 45/152 - loss 0.43409942 - time (sec): 30.83 - samples/sec: 288.05 - lr: 0.000137 - momentum: 0.000000 2023-10-06 13:33:34,179 epoch 3 - iter 60/152 - loss 0.42050300 - time (sec): 41.68 - samples/sec: 292.26 - lr: 0.000135 - momentum: 0.000000 2023-10-06 13:33:43,984 epoch 3 - iter 75/152 - loss 0.39094576 - time (sec): 51.48 - samples/sec: 292.64 - lr: 0.000134 - momentum: 0.000000 2023-10-06 13:33:54,706 epoch 3 - iter 90/152 - loss 0.38610791 - time (sec): 62.20 - samples/sec: 292.26 - lr: 0.000132 - momentum: 0.000000 2023-10-06 13:34:05,203 epoch 3 - iter 105/152 - loss 0.38386517 - time (sec): 72.70 - samples/sec: 293.47 - lr: 0.000130 - momentum: 0.000000 2023-10-06 13:34:15,417 epoch 3 - iter 120/152 - loss 0.37124899 - time (sec): 82.91 - samples/sec: 293.34 - lr: 0.000128 - momentum: 0.000000 2023-10-06 13:34:25,544 epoch 3 - iter 135/152 - loss 0.36219028 - time (sec): 93.04 - samples/sec: 292.19 - lr: 0.000127 - momentum: 0.000000 2023-10-06 13:34:36,871 epoch 3 - iter 150/152 - loss 0.35088417 - time (sec): 104.37 - samples/sec: 293.31 - lr: 0.000125 - momentum: 0.000000 2023-10-06 13:34:38,185 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:34:38,185 EPOCH 3 done: loss 0.3523 - lr: 0.000125 2023-10-06 13:34:45,514 DEV : loss 0.31032460927963257 - f1-score (micro avg) 0.509 2023-10-06 13:34:45,521 saving best model 2023-10-06 13:34:46,361 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:34:57,450 epoch 4 - iter 15/152 - loss 0.25902876 - time (sec): 11.09 - samples/sec: 282.03 - lr: 0.000123 - momentum: 0.000000 2023-10-06 13:35:07,842 epoch 4 - iter 30/152 - loss 0.26509458 - time (sec): 21.48 - samples/sec: 284.23 - lr: 0.000121 - momentum: 0.000000 2023-10-06 13:35:19,003 epoch 4 - iter 45/152 - loss 0.24911552 - time (sec): 32.64 - samples/sec: 289.49 - lr: 0.000119 - momentum: 0.000000 2023-10-06 13:35:30,501 epoch 4 - iter 60/152 - loss 0.24022459 - time (sec): 44.14 - samples/sec: 287.80 - lr: 0.000118 - momentum: 0.000000 2023-10-06 13:35:41,464 epoch 4 - iter 75/152 - loss 0.23388652 - time (sec): 55.10 - samples/sec: 287.75 - lr: 0.000116 - momentum: 0.000000 2023-10-06 13:35:52,636 epoch 4 - iter 90/152 - loss 0.22486772 - time (sec): 66.27 - samples/sec: 286.98 - lr: 0.000114 - momentum: 0.000000 2023-10-06 13:36:02,957 epoch 4 - iter 105/152 - loss 0.22428068 - time (sec): 76.59 - samples/sec: 284.47 - lr: 0.000112 - momentum: 0.000000 2023-10-06 13:36:13,967 epoch 4 - iter 120/152 - loss 0.22457153 - time (sec): 87.60 - samples/sec: 282.78 - lr: 0.000111 - momentum: 0.000000 2023-10-06 13:36:24,628 epoch 4 - iter 135/152 - loss 0.22084735 - time (sec): 98.27 - samples/sec: 281.42 - lr: 0.000109 - momentum: 0.000000 2023-10-06 13:36:35,657 epoch 4 - iter 150/152 - loss 0.21131239 - time (sec): 109.29 - samples/sec: 279.75 - lr: 0.000107 - momentum: 0.000000 2023-10-06 13:36:37,141 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:36:37,142 EPOCH 4 done: loss 0.2099 - lr: 0.000107 2023-10-06 13:36:45,080 DEV : loss 0.21370980143547058 - f1-score (micro avg) 0.6992 2023-10-06 13:36:45,087 saving best model 2023-10-06 13:36:49,457 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:37:00,462 epoch 5 - iter 15/152 - loss 0.10872738 - time (sec): 11.00 - samples/sec: 278.38 - lr: 0.000105 - momentum: 0.000000 2023-10-06 13:37:11,616 epoch 5 - iter 30/152 - loss 0.13777998 - time (sec): 22.16 - samples/sec: 272.46 - lr: 0.000104 - momentum: 0.000000 2023-10-06 13:37:23,301 epoch 5 - iter 45/152 - loss 0.13795163 - time (sec): 33.84 - samples/sec: 273.00 - lr: 0.000102 - momentum: 0.000000 2023-10-06 13:37:34,624 epoch 5 - iter 60/152 - loss 0.14670835 - time (sec): 45.17 - samples/sec: 274.23 - lr: 0.000100 - momentum: 0.000000 2023-10-06 13:37:45,829 epoch 5 - iter 75/152 - loss 0.14598383 - time (sec): 56.37 - samples/sec: 274.86 - lr: 0.000098 - momentum: 0.000000 2023-10-06 13:37:56,911 epoch 5 - iter 90/152 - loss 0.14474954 - time (sec): 67.45 - samples/sec: 273.60 - lr: 0.000097 - momentum: 0.000000 2023-10-06 13:38:07,518 epoch 5 - iter 105/152 - loss 0.13787093 - time (sec): 78.06 - samples/sec: 272.28 - lr: 0.000095 - momentum: 0.000000 2023-10-06 13:38:19,256 epoch 5 - iter 120/152 - loss 0.13669107 - time (sec): 89.80 - samples/sec: 273.92 - lr: 0.000093 - momentum: 0.000000 2023-10-06 13:38:30,190 epoch 5 - iter 135/152 - loss 0.13507134 - time (sec): 100.73 - samples/sec: 273.68 - lr: 0.000091 - momentum: 0.000000 2023-10-06 13:38:40,943 epoch 5 - iter 150/152 - loss 0.13751177 - time (sec): 111.48 - samples/sec: 273.99 - lr: 0.000090 - momentum: 0.000000 2023-10-06 13:38:42,525 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:38:42,525 EPOCH 5 done: loss 0.1365 - lr: 0.000090 2023-10-06 13:38:50,582 DEV : loss 0.16473478078842163 - f1-score (micro avg) 0.7526 2023-10-06 13:38:50,590 saving best model 2023-10-06 13:38:54,941 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:39:06,044 epoch 6 - iter 15/152 - loss 0.07998675 - time (sec): 11.10 - samples/sec: 266.91 - lr: 0.000088 - momentum: 0.000000 2023-10-06 13:39:16,956 epoch 6 - iter 30/152 - loss 0.10323402 - time (sec): 22.01 - samples/sec: 266.52 - lr: 0.000086 - momentum: 0.000000 2023-10-06 13:39:28,464 epoch 6 - iter 45/152 - loss 0.09838169 - time (sec): 33.52 - samples/sec: 270.37 - lr: 0.000084 - momentum: 0.000000 2023-10-06 13:39:39,961 epoch 6 - iter 60/152 - loss 0.10195980 - time (sec): 45.02 - samples/sec: 272.09 - lr: 0.000082 - momentum: 0.000000 2023-10-06 13:39:50,831 epoch 6 - iter 75/152 - loss 0.09938439 - time (sec): 55.89 - samples/sec: 271.76 - lr: 0.000081 - momentum: 0.000000 2023-10-06 13:40:01,928 epoch 6 - iter 90/152 - loss 0.09567239 - time (sec): 66.99 - samples/sec: 272.72 - lr: 0.000079 - momentum: 0.000000 2023-10-06 13:40:12,761 epoch 6 - iter 105/152 - loss 0.09691574 - time (sec): 77.82 - samples/sec: 272.48 - lr: 0.000077 - momentum: 0.000000 2023-10-06 13:40:23,673 epoch 6 - iter 120/152 - loss 0.10055910 - time (sec): 88.73 - samples/sec: 274.14 - lr: 0.000075 - momentum: 0.000000 2023-10-06 13:40:34,709 epoch 6 - iter 135/152 - loss 0.09736272 - time (sec): 99.77 - samples/sec: 275.33 - lr: 0.000074 - momentum: 0.000000 2023-10-06 13:40:45,679 epoch 6 - iter 150/152 - loss 0.09901317 - time (sec): 110.74 - samples/sec: 276.24 - lr: 0.000072 - momentum: 0.000000 2023-10-06 13:40:47,016 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:40:47,017 EPOCH 6 done: loss 0.0992 - lr: 0.000072 2023-10-06 13:40:54,720 DEV : loss 0.1492326259613037 - f1-score (micro avg) 0.8057 2023-10-06 13:40:54,727 saving best model 2023-10-06 13:40:59,096 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:41:10,225 epoch 7 - iter 15/152 - loss 0.12188078 - time (sec): 11.13 - samples/sec: 267.72 - lr: 0.000070 - momentum: 0.000000 2023-10-06 13:41:21,778 epoch 7 - iter 30/152 - loss 0.09942713 - time (sec): 22.68 - samples/sec: 274.86 - lr: 0.000068 - momentum: 0.000000 2023-10-06 13:41:32,790 epoch 7 - iter 45/152 - loss 0.08560929 - time (sec): 33.69 - samples/sec: 273.17 - lr: 0.000066 - momentum: 0.000000 2023-10-06 13:41:44,512 epoch 7 - iter 60/152 - loss 0.08360799 - time (sec): 45.41 - samples/sec: 274.45 - lr: 0.000065 - momentum: 0.000000 2023-10-06 13:41:55,800 epoch 7 - iter 75/152 - loss 0.08421525 - time (sec): 56.70 - samples/sec: 273.69 - lr: 0.000063 - momentum: 0.000000 2023-10-06 13:42:06,369 epoch 7 - iter 90/152 - loss 0.08088278 - time (sec): 67.27 - samples/sec: 271.32 - lr: 0.000061 - momentum: 0.000000 2023-10-06 13:42:17,134 epoch 7 - iter 105/152 - loss 0.07944829 - time (sec): 78.04 - samples/sec: 271.27 - lr: 0.000059 - momentum: 0.000000 2023-10-06 13:42:28,412 epoch 7 - iter 120/152 - loss 0.07758876 - time (sec): 89.31 - samples/sec: 272.72 - lr: 0.000058 - momentum: 0.000000 2023-10-06 13:42:39,751 epoch 7 - iter 135/152 - loss 0.07955479 - time (sec): 100.65 - samples/sec: 274.44 - lr: 0.000056 - momentum: 0.000000 2023-10-06 13:42:50,759 epoch 7 - iter 150/152 - loss 0.07621930 - time (sec): 111.66 - samples/sec: 274.34 - lr: 0.000054 - momentum: 0.000000 2023-10-06 13:42:52,088 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:42:52,089 EPOCH 7 done: loss 0.0759 - lr: 0.000054 2023-10-06 13:43:00,063 DEV : loss 0.1385459154844284 - f1-score (micro avg) 0.8122 2023-10-06 13:43:00,072 saving best model 2023-10-06 13:43:04,413 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:43:15,677 epoch 8 - iter 15/152 - loss 0.06065886 - time (sec): 11.26 - samples/sec: 278.99 - lr: 0.000052 - momentum: 0.000000 2023-10-06 13:43:27,188 epoch 8 - iter 30/152 - loss 0.06737653 - time (sec): 22.77 - samples/sec: 284.32 - lr: 0.000050 - momentum: 0.000000 2023-10-06 13:43:38,810 epoch 8 - iter 45/152 - loss 0.07427220 - time (sec): 34.40 - samples/sec: 285.36 - lr: 0.000049 - momentum: 0.000000 2023-10-06 13:43:49,965 epoch 8 - iter 60/152 - loss 0.07084343 - time (sec): 45.55 - samples/sec: 283.23 - lr: 0.000047 - momentum: 0.000000 2023-10-06 13:44:01,295 epoch 8 - iter 75/152 - loss 0.06703464 - time (sec): 56.88 - samples/sec: 282.26 - lr: 0.000045 - momentum: 0.000000 2023-10-06 13:44:12,436 epoch 8 - iter 90/152 - loss 0.06600633 - time (sec): 68.02 - samples/sec: 279.97 - lr: 0.000043 - momentum: 0.000000 2023-10-06 13:44:22,803 epoch 8 - iter 105/152 - loss 0.06429163 - time (sec): 78.39 - samples/sec: 277.35 - lr: 0.000042 - momentum: 0.000000 2023-10-06 13:44:33,910 epoch 8 - iter 120/152 - loss 0.06153810 - time (sec): 89.50 - samples/sec: 276.96 - lr: 0.000040 - momentum: 0.000000 2023-10-06 13:44:44,750 epoch 8 - iter 135/152 - loss 0.06062301 - time (sec): 100.34 - samples/sec: 276.23 - lr: 0.000038 - momentum: 0.000000 2023-10-06 13:44:55,362 epoch 8 - iter 150/152 - loss 0.05876614 - time (sec): 110.95 - samples/sec: 275.28 - lr: 0.000036 - momentum: 0.000000 2023-10-06 13:44:56,807 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:44:56,808 EPOCH 8 done: loss 0.0610 - lr: 0.000036 2023-10-06 13:45:04,644 DEV : loss 0.14109984040260315 - f1-score (micro avg) 0.8051 2023-10-06 13:45:04,651 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:45:15,861 epoch 9 - iter 15/152 - loss 0.04434870 - time (sec): 11.21 - samples/sec: 282.90 - lr: 0.000034 - momentum: 0.000000 2023-10-06 13:45:26,458 epoch 9 - iter 30/152 - loss 0.04558078 - time (sec): 21.81 - samples/sec: 273.88 - lr: 0.000033 - momentum: 0.000000 2023-10-06 13:45:37,145 epoch 9 - iter 45/152 - loss 0.04353997 - time (sec): 32.49 - samples/sec: 271.76 - lr: 0.000031 - momentum: 0.000000 2023-10-06 13:45:48,798 epoch 9 - iter 60/152 - loss 0.04632852 - time (sec): 44.14 - samples/sec: 276.93 - lr: 0.000029 - momentum: 0.000000 2023-10-06 13:45:59,680 epoch 9 - iter 75/152 - loss 0.05217307 - time (sec): 55.03 - samples/sec: 275.57 - lr: 0.000027 - momentum: 0.000000 2023-10-06 13:46:10,532 epoch 9 - iter 90/152 - loss 0.05106821 - time (sec): 65.88 - samples/sec: 275.46 - lr: 0.000026 - momentum: 0.000000 2023-10-06 13:46:21,616 epoch 9 - iter 105/152 - loss 0.05237197 - time (sec): 76.96 - samples/sec: 276.51 - lr: 0.000024 - momentum: 0.000000 2023-10-06 13:46:32,479 epoch 9 - iter 120/152 - loss 0.05305228 - time (sec): 87.83 - samples/sec: 276.45 - lr: 0.000022 - momentum: 0.000000 2023-10-06 13:46:44,286 epoch 9 - iter 135/152 - loss 0.05236682 - time (sec): 99.63 - samples/sec: 276.85 - lr: 0.000020 - momentum: 0.000000 2023-10-06 13:46:55,257 epoch 9 - iter 150/152 - loss 0.05298597 - time (sec): 110.60 - samples/sec: 276.45 - lr: 0.000019 - momentum: 0.000000 2023-10-06 13:46:56,744 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:46:56,745 EPOCH 9 done: loss 0.0524 - lr: 0.000019 2023-10-06 13:47:04,599 DEV : loss 0.1405918151140213 - f1-score (micro avg) 0.8126 2023-10-06 13:47:04,606 saving best model 2023-10-06 13:47:08,941 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:47:20,007 epoch 10 - iter 15/152 - loss 0.04176935 - time (sec): 11.06 - samples/sec: 270.43 - lr: 0.000017 - momentum: 0.000000 2023-10-06 13:47:31,423 epoch 10 - iter 30/152 - loss 0.05653610 - time (sec): 22.48 - samples/sec: 272.60 - lr: 0.000015 - momentum: 0.000000 2023-10-06 13:47:42,012 epoch 10 - iter 45/152 - loss 0.05116614 - time (sec): 33.07 - samples/sec: 270.28 - lr: 0.000013 - momentum: 0.000000 2023-10-06 13:47:52,484 epoch 10 - iter 60/152 - loss 0.04940568 - time (sec): 43.54 - samples/sec: 272.62 - lr: 0.000012 - momentum: 0.000000 2023-10-06 13:48:03,243 epoch 10 - iter 75/152 - loss 0.04696726 - time (sec): 54.30 - samples/sec: 276.85 - lr: 0.000010 - momentum: 0.000000 2023-10-06 13:48:13,785 epoch 10 - iter 90/152 - loss 0.04882995 - time (sec): 64.84 - samples/sec: 279.53 - lr: 0.000008 - momentum: 0.000000 2023-10-06 13:48:24,042 epoch 10 - iter 105/152 - loss 0.04750003 - time (sec): 75.10 - samples/sec: 280.54 - lr: 0.000006 - momentum: 0.000000 2023-10-06 13:48:34,658 epoch 10 - iter 120/152 - loss 0.04893707 - time (sec): 85.72 - samples/sec: 283.29 - lr: 0.000005 - momentum: 0.000000 2023-10-06 13:48:45,752 epoch 10 - iter 135/152 - loss 0.04854389 - time (sec): 96.81 - samples/sec: 285.48 - lr: 0.000003 - momentum: 0.000000 2023-10-06 13:48:55,808 epoch 10 - iter 150/152 - loss 0.04795580 - time (sec): 106.86 - samples/sec: 286.26 - lr: 0.000001 - momentum: 0.000000 2023-10-06 13:48:57,076 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:48:57,076 EPOCH 10 done: loss 0.0478 - lr: 0.000001 2023-10-06 13:49:04,195 DEV : loss 0.1406051069498062 - f1-score (micro avg) 0.8107 2023-10-06 13:49:05,053 ---------------------------------------------------------------------------------------------------- 2023-10-06 13:49:05,055 Loading model from best epoch ... 2023-10-06 13:49:08,553 SequenceTagger predicts: Dictionary with 25 tags: O, S-scope, B-scope, E-scope, I-scope, S-pers, B-pers, E-pers, I-pers, S-work, B-work, E-work, I-work, S-loc, B-loc, E-loc, I-loc, S-date, B-date, E-date, I-date, S-object, B-object, E-object, I-object 2023-10-06 13:49:15,272 Results: - F-score (micro) 0.7799 - F-score (macro) 0.4759 - Accuracy 0.6449 By class: precision recall f1-score support scope 0.7278 0.7616 0.7443 151 work 0.6949 0.8632 0.7700 95 pers 0.8036 0.9375 0.8654 96 loc 0.0000 0.0000 0.0000 3 date 0.0000 0.0000 0.0000 3 micro avg 0.7397 0.8247 0.7799 348 macro avg 0.4453 0.5124 0.4759 348 weighted avg 0.7272 0.8247 0.7719 348 2023-10-06 13:49:15,272 ----------------------------------------------------------------------------------------------------