Loss scales: [0.0, 0.0, 1.0] Noise std: 0.01 Use amp for speeding up training Random masking: True (prob: 0.05) Load teacher model: clip-ViT-B-32 Teacher model architecture: Framework( (0): CLIPModel() ) Create student model from output/2stages/1_b32_pt0_100 Training does not need the teacher model, set it to None Freeze the multimodal encoder of the student model Student model architecture: Framework( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False}) (2): Dense({'in_features': 768, 'out_features': 512, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'proj_token_embs': True}) (3): Projector({'in_features': 512, 'out_features': 768, 'bias': True, 'noise_std': 0.01, 'dropout': 0.1, 'noise_prob': 0, 'student_emb_keyname': 'token_embeddings', 'teacher_emb_keyname': 'source_embedding'}) (4): Decoder({'max_seq_length': 128, 'do_lower_case': False, 'attend_to': ['student'], 'teacher_model_name': 'clip-ViT-B-32'}) with Transformer model: BertLMHeadModel ) Total Params: 164986107 Trainable Params: 29858811 Load data/corpus/multilingual_cc3m/cc3m_en-zh.tsv There are 2 langauges: ['en', 'zh'] There are 1111391 lines, one of which is ['a very typical bus station', '一个非常典型的公交车站'] Load data/corpus/multilingual_cc3m/cc3m_en-de.tsv There are 2 langauges: ['en', 'de'] There are 1111391 lines, one of which is ['tourists take a photo in front of the entrance sign', 'Touristen machen ein Foto vor dem Eingangsschild'] Load data/corpus/multilingual_cc3m/cc3m_en-fr.tsv There are 2 langauges: ['en', 'fr'] There are 1111391 lines, one of which is ['farmer holding a box with grapes', 'agriculteur tenant une boîte avec des raisins'] Load data/corpus/multilingual_cc3m/cc3m_en.tsv There are 1 langauges: ['en'] There are 1111391 lines, one of which is ['woman selling flowers to decorate religious offerings at the market'] Epoch: 0 [ 500 / 111704] loss: 8.2068 loss_at_student: 8.2068 max mem: 4697 Epoch: 0 [ 1000 / 111704] loss: 6.9682 loss_at_student: 6.9682 max mem: 4697 Epoch: 0 [ 1500 / 111704] loss: 6.2740 loss_at_student: 6.2740 max mem: 4697 Epoch: 0 [ 2000 / 111704] loss: 5.9496 loss_at_student: 5.9496 max mem: 5278 Epoch: 0 [ 2500 / 111704] loss: 5.5792 loss_at_student: 5.5792 max mem: 5278 Epoch: 0 [ 3000 / 111704] loss: 5.4580 loss_at_student: 5.4580 max mem: 5331 Epoch: 0 [ 3500 / 111704] loss: 5.4396 loss_at_student: 5.4396 max mem: 5331 Epoch: 0 [ 4000 / 111704] loss: 5.3058 loss_at_student: 5.3058 max mem: 5331 Epoch: 0 [ 4500 / 111704] loss: 4.9260 loss_at_student: 4.9260 max mem: 5331 Epoch: 0 [ 5000 / 111704] loss: 4.3719 loss_at_student: 4.3719 max mem: 5331 Epoch: 0 [ 5500 / 111704] loss: 4.8109 loss_at_student: 4.8109 max mem: 5331 Epoch: 0 [ 6000 / 111704] loss: 4.1466 loss_at_student: 4.1466 max mem: 5331 Epoch: 0 [ 6500 / 111704] loss: 3.5785 loss_at_student: 3.5785 max mem: 5331 Epoch: 0 [ 7000 / 111704] loss: 3.8319 loss_at_student: 3.8319 max mem: 5331 Epoch: 0 [ 7500 / 111704] loss: 3.9597 loss_at_student: 3.9597 max mem: 5331 Epoch: 0 [ 8000 / 111704] loss: 3.9326 loss_at_student: 3.9326 max mem: 5331 Epoch: 0 [ 8500 / 111704] loss: 3.7734 loss_at_student: 3.7734 max mem: 5331 Epoch: 0 [ 9000 / 111704] loss: 3.8664 loss_at_student: 3.8664 max mem: 5331 Epoch: 0 [ 9500 / 111704] loss: 3.8938 loss_at_student: 3.8938 max mem: 5331 Epoch: 0 [ 10000 / 111704] loss: 3.7664 loss_at_student: 3.7664 max mem: 5331 Epoch: 0 [ 10500 / 111704] loss: 3.5440 loss_at_student: 3.5440 max mem: 5331 Epoch: 0 [ 11000 / 111704] loss: 3.9567 loss_at_student: 3.9567 max mem: 5331 Epoch: 0 [ 11500 / 111704] loss: 3.7551 loss_at_student: 3.7551 max mem: 5331 Epoch: 0 [ 12000 / 111704] loss: 3.7860 loss_at_student: 3.7860 max mem: 5331 Epoch: 0 [ 12500 / 111704] loss: 3.7117 loss_at_student: 3.7117 max mem: 5331 Epoch: 0 [ 13000 / 111704] loss: 3.5820 loss_at_student: 3.5820 max mem: 5331 Epoch: 0 [ 13500 / 111704] loss: 3.1427 loss_at_student: 3.1427 max mem: 5331 Epoch: 0 [ 14000 / 111704] loss: 3.5630 loss_at_student: 3.5630 max mem: 5331 Epoch: 0 [ 14500 / 111704] loss: 3.1082 loss_at_student: 3.1082 max mem: 5331 Epoch: 0 [ 15000 / 111704] loss: 3.1631 loss_at_student: 3.1631 max mem: 5331 Epoch: 0 [ 15500 / 111704] loss: 2.9942 loss_at_student: 2.9942 max mem: 5331 Epoch: 0 [ 16000 / 111704] loss: 3.4237 loss_at_student: 3.4237 max mem: 5331 Epoch: 0 [ 16500 / 111704] loss: 3.1513 loss_at_student: 3.1513 max mem: 5381 Epoch: 0 [ 17000 / 111704] loss: 3.5666 loss_at_student: 3.5666 max mem: 5381 Epoch: 0 [ 17500 / 111704] loss: 2.8342 loss_at_student: 2.8342 max mem: 5381 Epoch: 0 [ 18000 / 111704] loss: 3.0434 loss_at_student: 3.0434 max mem: 5381 Epoch: 0 [ 18500 / 111704] loss: 3.2013 loss_at_student: 3.2013 max mem: 5381 Epoch: 0 [ 19000 / 111704] loss: 3.2191 loss_at_student: 3.2191 max mem: 5381 Epoch: 0 [ 19500 / 111704] loss: 3.4190 loss_at_student: 3.4190 max mem: 5381 Epoch: 0 [ 20000 / 111704] loss: 2.3678 loss_at_student: 2.3678 max mem: 5381 Epoch: 0 [ 20500 / 111704] loss: 2.7529 loss_at_student: 2.7529 max mem: 5381 Epoch: 0 [ 21000 / 111704] loss: 2.7684 loss_at_student: 2.7684 max mem: 5381 Epoch: 0 [ 21500 / 111704] loss: 3.6253 loss_at_student: 3.6253 max mem: 5381 Epoch: 0 [ 22000 / 111704] loss: 2.6291 loss_at_student: 2.6291 max mem: 5381 Epoch: 0 [ 22500 / 111704] loss: 3.3023 loss_at_student: 3.3023 max mem: 5381 Epoch: 0 [ 23000 / 111704] loss: 2.8599 loss_at_student: 2.8599 max mem: 5921 Epoch: 0 [ 23500 / 111704] loss: 2.8037 loss_at_student: 2.8037 max mem: 5921 Epoch: 0 [ 24000 / 111704] loss: 2.6373 loss_at_student: 2.6373 max mem: 5921 Epoch: 0 [ 24500 / 111704] loss: 2.7971 loss_at_student: 2.7971 max mem: 5921 Epoch: 0 [ 25000 / 111704] loss: 3.4813 loss_at_student: 3.4813 max mem: 5921 Epoch: 0 [ 25500 / 111704] loss: 2.7579 loss_at_student: 2.7579 max mem: 5921 Epoch: 0 [ 26000 / 111704] loss: 2.9715 loss_at_student: 2.9715 max mem: 5921 Epoch: 0 [ 26500 / 111704] loss: 2.6519 loss_at_student: 2.6519 max mem: 5921 Epoch: 0 [ 27000 / 111704] loss: 2.4888 loss_at_student: 2.4888 max mem: 5921 Epoch: 0 [ 27500 / 111704] loss: 2.9470 loss_at_student: 2.9470 max mem: 5921 Epoch: 0 [ 28000 / 111704] loss: 2.6671 loss_at_student: 2.6671 max mem: 5921 Epoch: 0 [ 28500 / 111704] loss: 2.6985 loss_at_student: 2.6985 max mem: 5921 Epoch: 0 [ 29000 / 111704] loss: 2.6624 loss_at_student: 2.6624 max mem: 5921 Epoch: 0 [ 29500 / 111704] loss: 2.5963 loss_at_student: 2.5963 max mem: 5921 Epoch: 0 [ 30000 / 111704] loss: 2.9424 loss_at_student: 2.9424 max mem: 5921 Epoch: 0 [ 30500 / 111704] loss: 2.8727 loss_at_student: 2.8727 max mem: 5921 Epoch: 0 [ 31000 / 111704] loss: 2.2663 loss_at_student: 2.2663 max mem: 5921 Epoch: 0 [ 31500 / 111704] loss: 2.8550 loss_at_student: 2.8550 max mem: 5921 Epoch: 0 [ 32000 / 111704] loss: 2.9150 loss_at_student: 2.9150 max mem: 5921 Epoch: 0 [ 32500 / 111704] loss: 2.7366 loss_at_student: 2.7366 max mem: 5921 Epoch: 0 [ 33000 / 111704] loss: 2.5707 loss_at_student: 2.5707 max mem: 5921 Epoch: 0 [ 33500 / 111704] loss: 2.5773 loss_at_student: 2.5773 max mem: 5921 Epoch: 0 [ 34000 / 111704] loss: 3.0549 loss_at_student: 3.0549 max mem: 5921 Epoch: 0 [ 34500 / 111704] loss: 2.4977 loss_at_student: 2.4977 max mem: 5921 Epoch: 0 [ 35000 / 111704] loss: 2.3043 loss_at_student: 2.3043 max mem: 5921 Epoch: 0 [ 35500 / 111704] loss: 2.2521 loss_at_student: 2.2521 max mem: 5921 Epoch: 0 [ 36000 / 111704] loss: 2.7505 loss_at_student: 2.7505 max mem: 7885 Epoch: 0 [ 36500 / 111704] loss: 2.6632 loss_at_student: 2.6632 max mem: 7885 Epoch: 0 [ 37000 / 111704] loss: 2.5639 loss_at_student: 2.5639 max mem: 7885 Epoch: 0 [ 37500 / 111704] loss: 2.4880 loss_at_student: 2.4880 max mem: 7885 Epoch: 0 [ 38000 / 111704] loss: 2.4661 loss_at_student: 2.4661 max mem: 7885 Epoch: 0 [ 38500 / 111704] loss: 2.6395 loss_at_student: 2.6395 max mem: 7885 Epoch: 0 [ 39000 / 111704] loss: 2.6352 loss_at_student: 2.6352 max mem: 7885 Epoch: 0 [ 39500 / 111704] loss: 2.2113 loss_at_student: 2.2113 max mem: 7885 Epoch: 0 [ 40000 / 111704] loss: 2.2068 loss_at_student: 2.2068 max mem: 7885 Epoch: 0 [ 40500 / 111704] loss: 2.6822 loss_at_student: 2.6822 max mem: 7885 Epoch: 0 [ 41000 / 111704] loss: 2.5215 loss_at_student: 2.5215 max mem: 7885 Epoch: 0 [ 41500 / 111704] loss: 2.1203 loss_at_student: 2.1203 max mem: 7885 Epoch: 0 [ 42000 / 111704] loss: 2.4508 loss_at_student: 2.4508 max mem: 7885 Epoch: 0 [ 42500 / 111704] loss: 2.1632 loss_at_student: 2.1632 max mem: 7885 Epoch: 0 [ 43000 / 111704] loss: 2.2341 loss_at_student: 2.2341 max mem: 7885 Epoch: 0 [ 43500 / 111704] loss: 2.1546 loss_at_student: 2.1546 max mem: 7885 Epoch: 0 [ 44000 / 111704] loss: 2.7187 loss_at_student: 2.7187 max mem: 7885 Epoch: 0 [ 44500 / 111704] loss: 2.4024 loss_at_student: 2.4024 max mem: 7885 Epoch: 0 [ 45000 / 111704] loss: 2.4576 loss_at_student: 2.4576 max mem: 7885 Epoch: 0 [ 45500 / 111704] loss: 2.8137 loss_at_student: 2.8137 max mem: 7885 Epoch: 0 [ 46000 / 111704] loss: 2.3388 loss_at_student: 2.3388 max mem: 7885 Epoch: 0 [ 46500 / 111704] loss: 2.0659 loss_at_student: 2.0659 max mem: 7885 Epoch: 0 [ 47000 / 111704] loss: 1.6754 loss_at_student: 1.6754 max mem: 7885 Epoch: 0 [ 47500 / 111704] loss: 2.5369 loss_at_student: 2.5369 max mem: 7885 Epoch: 0 [ 48000 / 111704] loss: 2.0495 loss_at_student: 2.0495 max mem: 7885 Epoch: 0 [ 48500 / 111704] loss: 2.3402 loss_at_student: 2.3402 max mem: 7885 Epoch: 0 [ 49000 / 111704] loss: 2.2744 loss_at_student: 2.2744 max mem: 7885 Epoch: 0 [ 49500 / 111704] loss: 2.2694 loss_at_student: 2.2694 max mem: 7885 Epoch: 0 [ 50000 / 111704] loss: 2.5531 loss_at_student: 2.5531 max mem: 7885 Epoch: 0 [ 50500 / 111704] loss: 2.5372 loss_at_student: 2.5372 max mem: 7885 Epoch: 0 [ 51000 / 111704] loss: 1.9494 loss_at_student: 1.9494 max mem: 7885 Epoch: 0 [ 51500 / 111704] loss: 2.1987 loss_at_student: 2.1987 max mem: 7885 Epoch: 0 [ 52000 / 111704] loss: 2.3048 loss_at_student: 2.3048 max mem: 7885 Epoch: 0 [ 52500 / 111704] loss: 2.1703 loss_at_student: 2.1703 max mem: 7885 Epoch: 0 [ 53000 / 111704] loss: 1.9291 loss_at_student: 1.9291 max mem: 7885 Epoch: 0 [ 53500 / 111704] loss: 2.3182 loss_at_student: 2.3182 max mem: 7885 Epoch: 0 [ 54000 / 111704] loss: 1.7825 loss_at_student: 1.7825 max mem: 7885 Epoch: 0 [ 54500 / 111704] loss: 1.7283 loss_at_student: 1.7283 max mem: 7885 Epoch: 0 [ 55000 / 111704] loss: 2.1121 loss_at_student: 2.1121 max mem: 7885 Epoch: 0 [ 55500 / 111704] loss: 1.9788 loss_at_student: 1.9788 max mem: 7885 Epoch: 0 [ 56000 / 111704] loss: 1.6989 loss_at_student: 1.6989 max mem: 7885 Epoch: 0 [ 56500 / 111704] loss: 2.1385 loss_at_student: 2.1385 max mem: 7885 Epoch: 0 [ 57000 / 111704] loss: 1.9384 loss_at_student: 1.9384 max mem: 7885 Epoch: 0 [ 57500 / 111704] loss: 2.3501 loss_at_student: 2.3501 max mem: 7885 Epoch: 0 [ 58000 / 111704] loss: 2.6766 loss_at_student: 2.6766 max mem: 7885 Epoch: 0 [ 58500 / 111704] loss: 2.2485 loss_at_student: 2.2485 max mem: 7885 Epoch: 0 [ 59000 / 111704] loss: 2.3328 loss_at_student: 2.3328 max mem: 7885 Epoch: 0 [ 59500 / 111704] loss: 2.1173 loss_at_student: 2.1173 max mem: 7885 Epoch: 0 [ 60000 / 111704] loss: 1.8708 loss_at_student: 1.8708 max mem: 7885 Epoch: 0 [ 60500 / 111704] loss: 1.8741 loss_at_student: 1.8741 max mem: 7885 Epoch: 0 [ 61000 / 111704] loss: 2.2457 loss_at_student: 2.2457 max mem: 7885 Epoch: 0 [ 61500 / 111704] loss: 1.7808 loss_at_student: 1.7808 max mem: 7885 Epoch: 0 [ 62000 / 111704] loss: 1.9480 loss_at_student: 1.9480 max mem: 7885 Epoch: 0 [ 62500 / 111704] loss: 2.3274 loss_at_student: 2.3274 max mem: 7885 Epoch: 0 [ 63000 / 111704] loss: 2.3754 loss_at_student: 2.3754 max mem: 7885 Epoch: 0 [ 63500 / 111704] loss: 1.9464 loss_at_student: 1.9464 max mem: 7885 Epoch: 0 [ 64000 / 111704] loss: 2.2527 loss_at_student: 2.2527 max mem: 7885 Epoch: 0 [ 64500 / 111704] loss: 2.1652 loss_at_student: 2.1652 max mem: 7885 Epoch: 0 [ 65000 / 111704] loss: 2.5002 loss_at_student: 2.5002 max mem: 7885 Epoch: 0 [ 65500 / 111704] loss: 2.0991 loss_at_student: 2.0991 max mem: 7885 Epoch: 0 [ 66000 / 111704] loss: 2.0110 loss_at_student: 2.0110 max mem: 7885 Epoch: 0 [ 66500 / 111704] loss: 1.8287 loss_at_student: 1.8287 max mem: 7885 Epoch: 0 [ 67000 / 111704] loss: 2.1918 loss_at_student: 2.1918 max mem: 7885 Epoch: 0 [ 67500 / 111704] loss: 2.2245 loss_at_student: 2.2245 max mem: 7885 Epoch: 0 [ 68000 / 111704] loss: 2.1029 loss_at_student: 2.1029 max mem: 7885 Epoch: 0 [ 68500 / 111704] loss: 1.9577 loss_at_student: 1.9577 max mem: 7885 Epoch: 0 [ 69000 / 111704] loss: 2.2646 loss_at_student: 2.2646 max mem: 7885 Epoch: 0 [ 69500 / 111704] loss: 1.7756 loss_at_student: 1.7756 max mem: 7885 Epoch: 0 [ 70000 / 111704] loss: 1.7679 loss_at_student: 1.7679 max mem: 7885 Epoch: 0 [ 70500 / 111704] loss: 1.7923 loss_at_student: 1.7923 max mem: 7885 Epoch: 0 [ 71000 / 111704] loss: 2.0989 loss_at_student: 2.0989 max mem: 7885 Epoch: 0 [ 71500 / 111704] loss: 2.0133 loss_at_student: 2.0133 max mem: 7885 Epoch: 0 [ 72000 / 111704] loss: 2.1860 loss_at_student: 2.1860 max mem: 7885 Epoch: 0 [ 72500 / 111704] loss: 2.0189 loss_at_student: 2.0189 max mem: 7885 Epoch: 0 [ 73000 / 111704] loss: 1.8084 loss_at_student: 1.8084 max mem: 7885 Epoch: 0 [ 73500 / 111704] loss: 1.9966 loss_at_student: 1.9966 max mem: 7885 Epoch: 0 [ 74000 / 111704] loss: 2.0784 loss_at_student: 2.0784 max mem: 7885 Epoch: 0 [ 74500 / 111704] loss: 1.8213 loss_at_student: 1.8213 max mem: 7885 Epoch: 0 [ 75000 / 111704] loss: 1.8853 loss_at_student: 1.8853 max mem: 7885 Epoch: 0 [ 75500 / 111704] loss: 1.6783 loss_at_student: 1.6783 max mem: 7885 Epoch: 0 [ 76000 / 111704] loss: 2.1612 loss_at_student: 2.1612 max mem: 7885 Epoch: 0 [ 76500 / 111704] loss: 2.1659 loss_at_student: 2.1659 max mem: 7885 Epoch: 0 [ 77000 / 111704] loss: 1.8682 loss_at_student: 1.8682 max mem: 7885 Epoch: 0 [ 77500 / 111704] loss: 2.2028 loss_at_student: 2.2028 max mem: 7885 Epoch: 0 [ 78000 / 111704] loss: 1.7463 loss_at_student: 1.7463 max mem: 7885 Epoch: 0 [ 78500 / 111704] loss: 1.9757 loss_at_student: 1.9757 max mem: 7885 Epoch: 0 [ 79000 / 111704] loss: 2.2468 loss_at_student: 2.2468 max mem: 7885 Epoch: 0 [ 79500 / 111704] loss: 1.5780 loss_at_student: 1.5780 max mem: 7885 Epoch: 0 [ 80000 / 111704] loss: 1.7393 loss_at_student: 1.7393 max mem: 7885 Epoch: 0 [ 80500 / 111704] loss: 1.9418 loss_at_student: 1.9418 max mem: 7885 Epoch: 0 [ 81000 / 111704] loss: 2.0138 loss_at_student: 2.0138 max mem: 7885 Epoch: 0 [ 81500 / 111704] loss: 2.2736 loss_at_student: 2.2736 max mem: 7885 Epoch: 0 [ 82000 / 111704] loss: 2.1573 loss_at_student: 2.1573 max mem: 7885 Epoch: 0 [ 82500 / 111704] loss: 1.9955 loss_at_student: 1.9955 max mem: 7885 Epoch: 0 [ 83000 / 111704] loss: 1.8516 loss_at_student: 1.8516 max mem: 7885 Epoch: 0 [ 83500 / 111704] loss: 1.8913 loss_at_student: 1.8913 max mem: 7885 Epoch: 0 [ 84000 / 111704] loss: 2.2534 loss_at_student: 2.2534 max mem: 7885 Epoch: 0 [ 84500 / 111704] loss: 1.6223 loss_at_student: 1.6223 max mem: 7885 Epoch: 0 [ 85000 / 111704] loss: 1.9669 loss_at_student: 1.9669 max mem: 7885 Epoch: 0 [ 85500 / 111704] loss: 1.7339 loss_at_student: 1.7339 max mem: 7885 Epoch: 0 [ 86000 / 111704] loss: 1.8391 loss_at_student: 1.8391 max mem: 7885 Epoch: 0 [ 86500 / 111704] loss: 1.7012 loss_at_student: 1.7012 max mem: 7885 Epoch: 0 [ 87000 / 111704] loss: 1.9247 loss_at_student: 1.9247 max mem: 7885 Epoch: 0 [ 87500 / 111704] loss: 1.4296 loss_at_student: 1.4296 max mem: 7885 Epoch: 0 [ 88000 / 111704] loss: 1.8593 loss_at_student: 1.8593 max mem: 7885 Epoch: 0 [ 88500 / 111704] loss: 1.9755 loss_at_student: 1.9755 max mem: 7885 Epoch: 0 [ 89000 / 111704] loss: 1.6217 loss_at_student: 1.6217 max mem: 7885 Epoch: 0 [ 89500 / 111704] loss: 1.8927 loss_at_student: 1.8927 max mem: 7885 Epoch: 0 [ 90000 / 111704] loss: 1.9222 loss_at_student: 1.9222 max mem: 7885 Epoch: 0 [ 90500 / 111704] loss: 1.9381 loss_at_student: 1.9381 max mem: 7885 Epoch: 0 [ 91000 / 111704] loss: 2.1338 loss_at_student: 2.1338 max mem: 7885 Epoch: 0 [ 91500 / 111704] loss: 2.1099 loss_at_student: 2.1099 max mem: 7885 Epoch: 0 [ 92000 / 111704] loss: 1.7370 loss_at_student: 1.7370 max mem: 7885 Epoch: 0 [ 92500 / 111704] loss: 2.0145 loss_at_student: 2.0145 max mem: 7885 Epoch: 0 [ 93000 / 111704] loss: 1.8534 loss_at_student: 1.8534 max mem: 7885 Epoch: 0 [ 93500 / 111704] loss: 1.9676 loss_at_student: 1.9676 max mem: 7885 Epoch: 0 [ 94000 / 111704] loss: 1.7733 loss_at_student: 1.7733 max mem: 7885 Epoch: 0 [ 94500 / 111704] loss: 1.7180 loss_at_student: 1.7180 max mem: 7885 Epoch: 0 [ 95000 / 111704] loss: 1.8319 loss_at_student: 1.8319 max mem: 7885 Epoch: 0 [ 95500 / 111704] loss: 1.8236 loss_at_student: 1.8236 max mem: 7885 Epoch: 0 [ 96000 / 111704] loss: 1.5127 loss_at_student: 1.5127 max mem: 7885 Epoch: 0 [ 96500 / 111704] loss: 1.8960 loss_at_student: 1.8960 max mem: 7885 Epoch: 0 [ 97000 / 111704] loss: 1.5385 loss_at_student: 1.5385 max mem: 7885 Epoch: 0 [ 97500 / 111704] loss: 1.8908 loss_at_student: 1.8908 max mem: 7885 Epoch: 0 [ 98000 / 111704] loss: 1.5084 loss_at_student: 1.5084 max mem: 7885 Epoch: 0 [ 98500 / 111704] loss: 1.6181 loss_at_student: 1.6181 max mem: 7885 Epoch: 0 [ 99000 / 111704] loss: 2.1156 loss_at_student: 2.1156 max mem: 7885 Epoch: 0 [ 99500 / 111704] loss: 1.9344 loss_at_student: 1.9344 max mem: 7885 Epoch: 0 [100000 / 111704] loss: 1.8235 loss_at_student: 1.8235 max mem: 7885 Epoch: 0 [100500 / 111704] loss: 1.9422 loss_at_student: 1.9422 max mem: 7885 Epoch: 0 [101000 / 111704] loss: 1.6484 loss_at_student: 1.6484 max mem: 7885 Epoch: 0 [101500 / 111704] loss: 1.6752 loss_at_student: 1.6752 max mem: 7885 Epoch: 0 [102000 / 111704] loss: 1.6797 loss_at_student: 1.6797 max mem: 7885 Epoch: 0 [102500 / 111704] loss: 1.8975 loss_at_student: 1.8975 max mem: 7885 Epoch: 0 [103000 / 111704] loss: 1.6693 loss_at_student: 1.6693 max mem: 7885 Epoch: 0 [103500 / 111704] loss: 1.7632 loss_at_student: 1.7632 max mem: 7885 Epoch: 0 [104000 / 111704] loss: 1.7862 loss_at_student: 1.7862 max mem: 7885 Epoch: 0 [104500 / 111704] loss: 1.7263 loss_at_student: 1.7263 max mem: 7885 Epoch: 0 [105000 / 111704] loss: 2.1264 loss_at_student: 2.1264 max mem: 7885 Epoch: 0 [105500 / 111704] loss: 2.0321 loss_at_student: 2.0321 max mem: 7885 Epoch: 0 [106000 / 111704] loss: 1.4918 loss_at_student: 1.4918 max mem: 7885 Epoch: 0 [106500 / 111704] loss: 1.6619 loss_at_student: 1.6619 max mem: 7885 Epoch: 0 [107000 / 111704] loss: 2.0295 loss_at_student: 2.0295 max mem: 7885 Epoch: 0 [107500 / 111704] loss: 1.9354 loss_at_student: 1.9354 max mem: 7885 Epoch: 0 [108000 / 111704] loss: 1.7017 loss_at_student: 1.7017 max mem: 7885 Epoch: 0 [108500 / 111704] loss: 1.6422 loss_at_student: 1.6422 max mem: 7885 Epoch: 0 [109000 / 111704] loss: 1.6435 loss_at_student: 1.6435 max mem: 7885 Epoch: 0 [109500 / 111704] loss: 2.0792 loss_at_student: 2.0792 max mem: 7885 Epoch: 0 [110000 / 111704] loss: 2.0901 loss_at_student: 2.0901 max mem: 7885 Epoch: 0 [110500 / 111704] loss: 1.7540 loss_at_student: 1.7540 max mem: 7885 Epoch: 0 [111000 / 111704] loss: 1.9545 loss_at_student: 1.9545 max mem: 7885 Epoch: 0 [111500 / 111704] loss: 1.5822 loss_at_student: 1.5822 max mem: 7885 Averaged stats: loss: 2.5256 loss_at_student: 2.5256 Train epoch time: 4:20:00 Epoch: 1 [ 296 / 111704] loss: 1.5964 loss_at_student: 1.5964 max mem: 7885 Epoch: 1 [ 796 / 111704] loss: 1.6584 loss_at_student: 1.6584 max mem: 7885 Epoch: 1 [ 1296 / 111704] loss: 1.5702 loss_at_student: 1.5702 max mem: 7885 Epoch: 1 [ 1796 / 111704] loss: 1.6388 loss_at_student: 1.6388 max mem: 7885 Epoch: 1 [ 2296 / 111704] loss: 1.5477 loss_at_student: 1.5477 max mem: 7885 Epoch: 1 [ 2796 / 111704] loss: 1.9416 loss_at_student: 1.9416 max mem: 7885 Epoch: 1 [ 3296 / 111704] loss: 1.8267 loss_at_student: 1.8267 max mem: 7885 Epoch: 1 [ 3796 / 111704] loss: 1.6739 loss_at_student: 1.6739 max mem: 7885 Epoch: 1 [ 4296 / 111704] loss: 1.1963 loss_at_student: 1.1963 max mem: 7885 Epoch: 1 [ 4796 / 111704] loss: 1.6128 loss_at_student: 1.6128 max mem: 7885 Epoch: 1 [ 5296 / 111704] loss: 1.4139 loss_at_student: 1.4139 max mem: 7885 Epoch: 1 [ 5796 / 111704] loss: 1.5477 loss_at_student: 1.5477 max mem: 7885 Epoch: 1 [ 6296 / 111704] loss: 1.8331 loss_at_student: 1.8331 max mem: 7885 Epoch: 1 [ 6796 / 111704] loss: 1.8043 loss_at_student: 1.8043 max mem: 7885 Epoch: 1 [ 7296 / 111704] loss: 1.5117 loss_at_student: 1.5117 max mem: 7885 Epoch: 1 [ 7796 / 111704] loss: 1.5023 loss_at_student: 1.5023 max mem: 7885 Epoch: 1 [ 8296 / 111704] loss: 1.6997 loss_at_student: 1.6997 max mem: 7885 Epoch: 1 [ 8796 / 111704] loss: 1.8608 loss_at_student: 1.8608 max mem: 7885 Epoch: 1 [ 9296 / 111704] loss: 1.6205 loss_at_student: 1.6205 max mem: 7885 Epoch: 1 [ 9796 / 111704] loss: 1.6804 loss_at_student: 1.6804 max mem: 7885 Epoch: 1 [ 10296 / 111704] loss: 1.4314 loss_at_student: 1.4314 max mem: 7885 Epoch: 1 [ 10796 / 111704] loss: 1.6279 loss_at_student: 1.6279 max mem: 7885 Epoch: 1 [ 11296 / 111704] loss: 2.0100 loss_at_student: 2.0100 max mem: 7885 Epoch: 1 [ 11796 / 111704] loss: 1.4804 loss_at_student: 1.4804 max mem: 7885 Epoch: 1 [ 12296 / 111704] loss: 1.6175 loss_at_student: 1.6175 max mem: 7885 Epoch: 1 [ 12796 / 111704] loss: 1.9216 loss_at_student: 1.9216 max mem: 7885 Epoch: 1 [ 13296 / 111704] loss: 2.1243 loss_at_student: 2.1243 max mem: 7885 Epoch: 1 [ 13796 / 111704] loss: 1.4190 loss_at_student: 1.4190 max mem: 7885 Epoch: 1 [ 14296 / 111704] loss: 1.5584 loss_at_student: 1.5584 max mem: 7885 Epoch: 1 [ 14796 / 111704] loss: 1.6569 loss_at_student: 1.6569 max mem: 7885 Epoch: 1 [ 15296 / 111704] loss: 1.5739 loss_at_student: 1.5739 max mem: 7885 Epoch: 1 [ 15796 / 111704] loss: 1.4883 loss_at_student: 1.4883 max mem: 7885 Epoch: 1 [ 16296 / 111704] loss: 1.7525 loss_at_student: 1.7525 max mem: 7885 Epoch: 1 [ 16796 / 111704] loss: 1.8083 loss_at_student: 1.8083 max mem: 7885 Epoch: 1 [ 17296 / 111704] loss: 1.5389 loss_at_student: 1.5389 max mem: 7885 Epoch: 1 [ 17796 / 111704] loss: 1.6778 loss_at_student: 1.6778 max mem: 7885 Epoch: 1 [ 18296 / 111704] loss: 1.7479 loss_at_student: 1.7479 max mem: 7885 Epoch: 1 [ 18796 / 111704] loss: 1.8185 loss_at_student: 1.8185 max mem: 7885 Epoch: 1 [ 19296 / 111704] loss: 1.7637 loss_at_student: 1.7637 max mem: 7885 Epoch: 1 [ 19796 / 111704] loss: 1.7022 loss_at_student: 1.7022 max mem: 7885 Epoch: 1 [ 20296 / 111704] loss: 1.7412 loss_at_student: 1.7412 max mem: 7885 Epoch: 1 [ 20796 / 111704] loss: 1.4965 loss_at_student: 1.4965 max mem: 7885 Epoch: 1 [ 21296 / 111704] loss: 1.8364 loss_at_student: 1.8364 max mem: 7885 Epoch: 1 [ 21796 / 111704] loss: 1.4768 loss_at_student: 1.4768 max mem: 7885 Epoch: 1 [ 22296 / 111704] loss: 1.8058 loss_at_student: 1.8058 max mem: 7885 Epoch: 1 [ 22796 / 111704] loss: 1.5257 loss_at_student: 1.5257 max mem: 7885 Epoch: 1 [ 23296 / 111704] loss: 1.4485 loss_at_student: 1.4485 max mem: 7885 Epoch: 1 [ 23796 / 111704] loss: 1.8154 loss_at_student: 1.8154 max mem: 7885 Epoch: 1 [ 24296 / 111704] loss: 1.4007 loss_at_student: 1.4007 max mem: 7885 Epoch: 1 [ 24796 / 111704] loss: 1.5790 loss_at_student: 1.5790 max mem: 7885 Epoch: 1 [ 25296 / 111704] loss: 1.5631 loss_at_student: 1.5631 max mem: 7885 Epoch: 1 [ 25796 / 111704] loss: 1.5708 loss_at_student: 1.5708 max mem: 7885 Epoch: 1 [ 26296 / 111704] loss: 1.3905 loss_at_student: 1.3905 max mem: 7885 Epoch: 1 [ 26796 / 111704] loss: 1.5397 loss_at_student: 1.5397 max mem: 7885 Epoch: 1 [ 27296 / 111704] loss: 1.8935 loss_at_student: 1.8935 max mem: 7885 Epoch: 1 [ 27796 / 111704] loss: 1.7837 loss_at_student: 1.7837 max mem: 7885 Epoch: 1 [ 28296 / 111704] loss: 1.6033 loss_at_student: 1.6033 max mem: 7885 Epoch: 1 [ 28796 / 111704] loss: 1.4798 loss_at_student: 1.4798 max mem: 7885 Epoch: 1 [ 29296 / 111704] loss: 1.3596 loss_at_student: 1.3596 max mem: 7885 Epoch: 1 [ 29796 / 111704] loss: 1.3375 loss_at_student: 1.3375 max mem: 7885 Epoch: 1 [ 30296 / 111704] loss: 1.6612 loss_at_student: 1.6612 max mem: 7885 Epoch: 1 [ 30796 / 111704] loss: 1.4007 loss_at_student: 1.4007 max mem: 7885 Epoch: 1 [ 31296 / 111704] loss: 1.7263 loss_at_student: 1.7263 max mem: 7885 Epoch: 1 [ 31796 / 111704] loss: 1.6703 loss_at_student: 1.6703 max mem: 7885 Epoch: 1 [ 32296 / 111704] loss: 1.6320 loss_at_student: 1.6320 max mem: 7885 Epoch: 1 [ 32796 / 111704] loss: 1.4789 loss_at_student: 1.4789 max mem: 7885 Epoch: 1 [ 33296 / 111704] loss: 1.8357 loss_at_student: 1.8357 max mem: 7885 Epoch: 1 [ 33796 / 111704] loss: 1.2447 loss_at_student: 1.2447 max mem: 7885 Epoch: 1 [ 34296 / 111704] loss: 1.6781 loss_at_student: 1.6781 max mem: 7885 Epoch: 1 [ 34796 / 111704] loss: 1.7863 loss_at_student: 1.7863 max mem: 7885 Epoch: 1 [ 35296 / 111704] loss: 1.8048 loss_at_student: 1.8048 max mem: 7885 Epoch: 1 [ 35796 / 111704] loss: 1.6177 loss_at_student: 1.6177 max mem: 7885 Epoch: 1 [ 36296 / 111704] loss: 1.4588 loss_at_student: 1.4588 max mem: 7885 Epoch: 1 [ 36796 / 111704] loss: 1.3951 loss_at_student: 1.3951 max mem: 7885 Epoch: 1 [ 37296 / 111704] loss: 1.2153 loss_at_student: 1.2153 max mem: 7885 Epoch: 1 [ 37796 / 111704] loss: 1.7087 loss_at_student: 1.7087 max mem: 7885 Epoch: 1 [ 38296 / 111704] loss: 1.3987 loss_at_student: 1.3987 max mem: 7885 Epoch: 1 [ 38796 / 111704] loss: 1.1659 loss_at_student: 1.1659 max mem: 7885 Epoch: 1 [ 39296 / 111704] loss: 1.6336 loss_at_student: 1.6336 max mem: 7885 Epoch: 1 [ 39796 / 111704] loss: 1.6324 loss_at_student: 1.6324 max mem: 7885 Epoch: 1 [ 40296 / 111704] loss: 1.5198 loss_at_student: 1.5198 max mem: 7885 Epoch: 1 [ 40796 / 111704] loss: 1.2581 loss_at_student: 1.2581 max mem: 7885 Epoch: 1 [ 41296 / 111704] loss: 1.6281 loss_at_student: 1.6281 max mem: 7885 Epoch: 1 [ 41796 / 111704] loss: 1.7171 loss_at_student: 1.7171 max mem: 7885 Epoch: 1 [ 42296 / 111704] loss: 1.5720 loss_at_student: 1.5720 max mem: 7885 Epoch: 1 [ 42796 / 111704] loss: 1.6448 loss_at_student: 1.6448 max mem: 7885 Epoch: 1 [ 43296 / 111704] loss: 2.0315 loss_at_student: 2.0315 max mem: 7885 Epoch: 1 [ 43796 / 111704] loss: 1.7094 loss_at_student: 1.7094 max mem: 7885 Epoch: 1 [ 44296 / 111704] loss: 1.5274 loss_at_student: 1.5274 max mem: 7885 Epoch: 1 [ 44796 / 111704] loss: 1.4676 loss_at_student: 1.4676 max mem: 7885 Epoch: 1 [ 45296 / 111704] loss: 1.5356 loss_at_student: 1.5356 max mem: 7885 Epoch: 1 [ 45796 / 111704] loss: 1.6486 loss_at_student: 1.6486 max mem: 7885 Epoch: 1 [ 46296 / 111704] loss: 1.6760 loss_at_student: 1.6760 max mem: 7885 Epoch: 1 [ 46796 / 111704] loss: 1.3822 loss_at_student: 1.3822 max mem: 7885 Epoch: 1 [ 47296 / 111704] loss: 1.7793 loss_at_student: 1.7793 max mem: 7885 Epoch: 1 [ 47796 / 111704] loss: 1.3519 loss_at_student: 1.3519 max mem: 7885 Epoch: 1 [ 48296 / 111704] loss: 1.6506 loss_at_student: 1.6506 max mem: 7885 Epoch: 1 [ 48796 / 111704] loss: 1.6530 loss_at_student: 1.6530 max mem: 7885 Epoch: 1 [ 49296 / 111704] loss: 1.3909 loss_at_student: 1.3909 max mem: 7885 Epoch: 1 [ 49796 / 111704] loss: 1.3181 loss_at_student: 1.3181 max mem: 7885 Epoch: 1 [ 50296 / 111704] loss: 1.5075 loss_at_student: 1.5075 max mem: 7885 Epoch: 1 [ 50796 / 111704] loss: 1.1827 loss_at_student: 1.1827 max mem: 7885 Epoch: 1 [ 51296 / 111704] loss: 1.7227 loss_at_student: 1.7227 max mem: 7885 Epoch: 1 [ 51796 / 111704] loss: 2.1807 loss_at_student: 2.1807 max mem: 7885 Epoch: 1 [ 52296 / 111704] loss: 1.6247 loss_at_student: 1.6247 max mem: 7885 Epoch: 1 [ 52796 / 111704] loss: 1.8438 loss_at_student: 1.8438 max mem: 7885 Epoch: 1 [ 53296 / 111704] loss: 1.6573 loss_at_student: 1.6573 max mem: 7885 Epoch: 1 [ 53796 / 111704] loss: 1.5322 loss_at_student: 1.5322 max mem: 7885 Epoch: 1 [ 54296 / 111704] loss: 1.4187 loss_at_student: 1.4187 max mem: 7885 Epoch: 1 [ 54796 / 111704] loss: 1.4325 loss_at_student: 1.4325 max mem: 7885 Epoch: 1 [ 55296 / 111704] loss: 1.3314 loss_at_student: 1.3314 max mem: 7885 Epoch: 1 [ 55796 / 111704] loss: 1.1822 loss_at_student: 1.1822 max mem: 7885 Epoch: 1 [ 56296 / 111704] loss: 1.2834 loss_at_student: 1.2834 max mem: 7885 Epoch: 1 [ 56796 / 111704] loss: 1.6707 loss_at_student: 1.6707 max mem: 7885 Epoch: 1 [ 57296 / 111704] loss: 1.4194 loss_at_student: 1.4194 max mem: 7885 Epoch: 1 [ 57796 / 111704] loss: 1.7395 loss_at_student: 1.7395 max mem: 7885 Epoch: 1 [ 58296 / 111704] loss: 1.6739 loss_at_student: 1.6739 max mem: 7885 Epoch: 1 [ 58796 / 111704] loss: 1.4504 loss_at_student: 1.4504 max mem: 7885 Epoch: 1 [ 59296 / 111704] loss: 1.4906 loss_at_student: 1.4906 max mem: 7885 Epoch: 1 [ 59796 / 111704] loss: 1.2224 loss_at_student: 1.2224 max mem: 7885 Epoch: 1 [ 60296 / 111704] loss: 1.8449 loss_at_student: 1.8449 max mem: 7885 Epoch: 1 [ 60796 / 111704] loss: 1.3573 loss_at_student: 1.3573 max mem: 7885 Epoch: 1 [ 61296 / 111704] loss: 1.5656 loss_at_student: 1.5656 max mem: 7885 Epoch: 1 [ 61796 / 111704] loss: 1.7896 loss_at_student: 1.7896 max mem: 7885 Epoch: 1 [ 62296 / 111704] loss: 1.3131 loss_at_student: 1.3131 max mem: 7885 Epoch: 1 [ 62796 / 111704] loss: 1.4363 loss_at_student: 1.4363 max mem: 7885 Epoch: 1 [ 63296 / 111704] loss: 1.6246 loss_at_student: 1.6246 max mem: 7885 Epoch: 1 [ 63796 / 111704] loss: 1.9420 loss_at_student: 1.9420 max mem: 7885 Epoch: 1 [ 64296 / 111704] loss: 1.8540 loss_at_student: 1.8540 max mem: 7885 Epoch: 1 [ 64796 / 111704] loss: 1.4684 loss_at_student: 1.4684 max mem: 7885 Epoch: 1 [ 65296 / 111704] loss: 1.5762 loss_at_student: 1.5762 max mem: 7885 Epoch: 1 [ 65796 / 111704] loss: 1.3750 loss_at_student: 1.3750 max mem: 7885 Epoch: 1 [ 66296 / 111704] loss: 1.3773 loss_at_student: 1.3773 max mem: 7885 Epoch: 1 [ 66796 / 111704] loss: 1.7930 loss_at_student: 1.7930 max mem: 7885 Epoch: 1 [ 67296 / 111704] loss: 1.3381 loss_at_student: 1.3381 max mem: 7885 Epoch: 1 [ 67796 / 111704] loss: 1.1926 loss_at_student: 1.1926 max mem: 7885 Epoch: 1 [ 68296 / 111704] loss: 1.5563 loss_at_student: 1.5563 max mem: 7885 Epoch: 1 [ 68796 / 111704] loss: 1.5247 loss_at_student: 1.5247 max mem: 7885 Epoch: 1 [ 69296 / 111704] loss: 1.4615 loss_at_student: 1.4615 max mem: 7885 Epoch: 1 [ 69796 / 111704] loss: 1.6890 loss_at_student: 1.6890 max mem: 7885 Epoch: 1 [ 70296 / 111704] loss: 1.6093 loss_at_student: 1.6093 max mem: 7885 Epoch: 1 [ 70796 / 111704] loss: 1.4446 loss_at_student: 1.4446 max mem: 7885 Epoch: 1 [ 71296 / 111704] loss: 1.2723 loss_at_student: 1.2723 max mem: 7885 Epoch: 1 [ 71796 / 111704] loss: 1.4305 loss_at_student: 1.4305 max mem: 7885 Epoch: 1 [ 72296 / 111704] loss: 1.5579 loss_at_student: 1.5579 max mem: 7885 Epoch: 1 [ 72796 / 111704] loss: 1.8145 loss_at_student: 1.8145 max mem: 7885 Epoch: 1 [ 73296 / 111704] loss: 1.8190 loss_at_student: 1.8190 max mem: 7885 Epoch: 1 [ 73796 / 111704] loss: 1.2100 loss_at_student: 1.2100 max mem: 7885 Epoch: 1 [ 74296 / 111704] loss: 1.2670 loss_at_student: 1.2670 max mem: 7885 Epoch: 1 [ 74796 / 111704] loss: 1.3125 loss_at_student: 1.3125 max mem: 7885 Epoch: 1 [ 75296 / 111704] loss: 1.9159 loss_at_student: 1.9159 max mem: 7885 Epoch: 1 [ 75796 / 111704] loss: 1.4874 loss_at_student: 1.4874 max mem: 7885 Epoch: 1 [ 76296 / 111704] loss: 1.6697 loss_at_student: 1.6697 max mem: 7885 Epoch: 1 [ 76796 / 111704] loss: 1.3324 loss_at_student: 1.3324 max mem: 7885 Epoch: 1 [ 77296 / 111704] loss: 1.4904 loss_at_student: 1.4904 max mem: 7885 Epoch: 1 [ 77796 / 111704] loss: 1.4579 loss_at_student: 1.4579 max mem: 7885 Epoch: 1 [ 78296 / 111704] loss: 1.4399 loss_at_student: 1.4399 max mem: 7885 Epoch: 1 [ 78796 / 111704] loss: 1.2946 loss_at_student: 1.2946 max mem: 7885 Epoch: 1 [ 79296 / 111704] loss: 1.6377 loss_at_student: 1.6377 max mem: 7885 Epoch: 1 [ 79796 / 111704] loss: 1.5727 loss_at_student: 1.5727 max mem: 7885 Epoch: 1 [ 80296 / 111704] loss: 1.2826 loss_at_student: 1.2826 max mem: 7885 Epoch: 1 [ 80796 / 111704] loss: 1.5092 loss_at_student: 1.5092 max mem: 7885 Epoch: 1 [ 81296 / 111704] loss: 1.5226 loss_at_student: 1.5226 max mem: 7885 Epoch: 1 [ 81796 / 111704] loss: 1.4934 loss_at_student: 1.4934 max mem: 7885 Epoch: 1 [ 82296 / 111704] loss: 1.6465 loss_at_student: 1.6465 max mem: 7885 Epoch: 1 [ 82796 / 111704] loss: 1.2723 loss_at_student: 1.2723 max mem: 7885 Epoch: 1 [ 83296 / 111704] loss: 1.3623 loss_at_student: 1.3623 max mem: 7885 Epoch: 1 [ 83796 / 111704] loss: 1.2419 loss_at_student: 1.2419 max mem: 7885 Epoch: 1 [ 84296 / 111704] loss: 1.6096 loss_at_student: 1.6096 max mem: 7885 Epoch: 1 [ 84796 / 111704] loss: 1.4745 loss_at_student: 1.4745 max mem: 7885 Epoch: 1 [ 85296 / 111704] loss: 1.7058 loss_at_student: 1.7058 max mem: 7885 Epoch: 1 [ 85796 / 111704] loss: 1.7492 loss_at_student: 1.7492 max mem: 7885 Epoch: 1 [ 86296 / 111704] loss: 1.5544 loss_at_student: 1.5544 max mem: 7885 Epoch: 1 [ 86796 / 111704] loss: 1.3029 loss_at_student: 1.3029 max mem: 7885 Epoch: 1 [ 87296 / 111704] loss: 1.3829 loss_at_student: 1.3829 max mem: 7885 Epoch: 1 [ 87796 / 111704] loss: 1.3481 loss_at_student: 1.3481 max mem: 7885 Epoch: 1 [ 88296 / 111704] loss: 1.3784 loss_at_student: 1.3784 max mem: 7885 Epoch: 1 [ 88796 / 111704] loss: 1.2837 loss_at_student: 1.2837 max mem: 7885 Epoch: 1 [ 89296 / 111704] loss: 1.7100 loss_at_student: 1.7100 max mem: 7885 Epoch: 1 [ 89796 / 111704] loss: 1.6818 loss_at_student: 1.6818 max mem: 7885 Epoch: 1 [ 90296 / 111704] loss: 1.3685 loss_at_student: 1.3685 max mem: 7885 Epoch: 1 [ 90796 / 111704] loss: 1.3460 loss_at_student: 1.3460 max mem: 7885 Epoch: 1 [ 91296 / 111704] loss: 1.4747 loss_at_student: 1.4747 max mem: 7885 Epoch: 1 [ 91796 / 111704] loss: 1.7435 loss_at_student: 1.7435 max mem: 7885 Epoch: 1 [ 92296 / 111704] loss: 1.3324 loss_at_student: 1.3324 max mem: 7885 Epoch: 1 [ 92796 / 111704] loss: 1.3700 loss_at_student: 1.3700 max mem: 7885 Epoch: 1 [ 93296 / 111704] loss: 1.4171 loss_at_student: 1.4171 max mem: 7885 Epoch: 1 [ 93796 / 111704] loss: 1.3072 loss_at_student: 1.3072 max mem: 7885 Epoch: 1 [ 94296 / 111704] loss: 1.4185 loss_at_student: 1.4185 max mem: 7885 Epoch: 1 [ 94796 / 111704] loss: 1.5841 loss_at_student: 1.5841 max mem: 7885 Epoch: 1 [ 95296 / 111704] loss: 1.6591 loss_at_student: 1.6591 max mem: 7885 Epoch: 1 [ 95796 / 111704] loss: 1.3510 loss_at_student: 1.3510 max mem: 7885 Epoch: 1 [ 96296 / 111704] loss: 1.3744 loss_at_student: 1.3744 max mem: 7885 Epoch: 1 [ 96796 / 111704] loss: 1.2236 loss_at_student: 1.2236 max mem: 7885 Epoch: 1 [ 97296 / 111704] loss: 1.7428 loss_at_student: 1.7428 max mem: 7885 Epoch: 1 [ 97796 / 111704] loss: 1.4455 loss_at_student: 1.4455 max mem: 7885 Epoch: 1 [ 98296 / 111704] loss: 1.5568 loss_at_student: 1.5568 max mem: 7885 Epoch: 1 [ 98796 / 111704] loss: 1.1846 loss_at_student: 1.1846 max mem: 7885 Epoch: 1 [ 99296 / 111704] loss: 1.3425 loss_at_student: 1.3425 max mem: 7885 Epoch: 1 [ 99796 / 111704] loss: 1.1393 loss_at_student: 1.1393 max mem: 7885 Epoch: 1 [100296 / 111704] loss: 1.4696 loss_at_student: 1.4696 max mem: 7885 Epoch: 1 [100796 / 111704] loss: 1.8636 loss_at_student: 1.8636 max mem: 7885 Epoch: 1 [101296 / 111704] loss: 1.6428 loss_at_student: 1.6428 max mem: 7885 Epoch: 1 [101796 / 111704] loss: 1.5086 loss_at_student: 1.5086 max mem: 7885 Epoch: 1 [102296 / 111704] loss: 1.3229 loss_at_student: 1.3229 max mem: 7885 Epoch: 1 [102796 / 111704] loss: 1.2695 loss_at_student: 1.2695 max mem: 7885 Epoch: 1 [103296 / 111704] loss: 1.6607 loss_at_student: 1.6607 max mem: 7885 Epoch: 1 [103796 / 111704] loss: 1.6075 loss_at_student: 1.6075 max mem: 7885 Epoch: 1 [104296 / 111704] loss: 1.3136 loss_at_student: 1.3136 max mem: 7885 Epoch: 1 [104796 / 111704] loss: 1.2826 loss_at_student: 1.2826 max mem: 7885 Epoch: 1 [105296 / 111704] loss: 1.2794 loss_at_student: 1.2794 max mem: 7885 Epoch: 1 [105796 / 111704] loss: 1.2884 loss_at_student: 1.2884 max mem: 7885 Epoch: 1 [106296 / 111704] loss: 1.4801 loss_at_student: 1.4801 max mem: 7885 Epoch: 1 [106796 / 111704] loss: 1.4394 loss_at_student: 1.4394 max mem: 7885 Epoch: 1 [107296 / 111704] loss: 1.6210 loss_at_student: 1.6210 max mem: 7885 Epoch: 1 [107796 / 111704] loss: 1.3792 loss_at_student: 1.3792 max mem: 7885 Epoch: 1 [108296 / 111704] loss: 1.1660 loss_at_student: 1.1660 max mem: 7885 Epoch: 1 [108796 / 111704] loss: 1.4648 loss_at_student: 1.4648 max mem: 7885 Epoch: 1 [109296 / 111704] loss: 1.6664 loss_at_student: 1.6664 max mem: 7885 Epoch: 1 [109796 / 111704] loss: 1.2893 loss_at_student: 1.2893 max mem: 7885 Epoch: 1 [110296 / 111704] loss: 1.6977 loss_at_student: 1.6977 max mem: 7885 Epoch: 1 [110796 / 111704] loss: 1.4789 loss_at_student: 1.4789 max mem: 7885 Epoch: 1 [111296 / 111704] loss: 1.2818 loss_at_student: 1.2818 max mem: 7885 Averaged stats: loss: 1.5469 loss_at_student: 1.5469 Train epoch time: 4:16:02 Epoch: 2 [ 92 / 111704] loss: 1.1870 loss_at_student: 1.1870 max mem: 7885 Epoch: 2 [ 592 / 111704] loss: 1.6786 loss_at_student: 1.6786 max mem: 7885 Epoch: 2 [ 1092 / 111704] loss: 1.3686 loss_at_student: 1.3686 max mem: 7885 Epoch: 2 [ 1592 / 111704] loss: 1.3965 loss_at_student: 1.3965 max mem: 7885 Epoch: 2 [ 2092 / 111704] loss: 1.2709 loss_at_student: 1.2709 max mem: 7885 Epoch: 2 [ 2592 / 111704] loss: 1.4958 loss_at_student: 1.4958 max mem: 7885 Epoch: 2 [ 3092 / 111704] loss: 1.5505 loss_at_student: 1.5505 max mem: 7885 Epoch: 2 [ 3592 / 111704] loss: 1.5489 loss_at_student: 1.5489 max mem: 7885 Epoch: 2 [ 4092 / 111704] loss: 1.1801 loss_at_student: 1.1801 max mem: 7885 Epoch: 2 [ 4592 / 111704] loss: 1.2196 loss_at_student: 1.2196 max mem: 7885 Epoch: 2 [ 5092 / 111704] loss: 1.6848 loss_at_student: 1.6848 max mem: 7885 Epoch: 2 [ 5592 / 111704] loss: 1.4150 loss_at_student: 1.4150 max mem: 7885 Epoch: 2 [ 6092 / 111704] loss: 1.2298 loss_at_student: 1.2298 max mem: 7885 Epoch: 2 [ 6592 / 111704] loss: 1.0772 loss_at_student: 1.0772 max mem: 7885 Epoch: 2 [ 7092 / 111704] loss: 1.3144 loss_at_student: 1.3144 max mem: 7885 Epoch: 2 [ 7592 / 111704] loss: 1.3659 loss_at_student: 1.3659 max mem: 7885 Epoch: 2 [ 8092 / 111704] loss: 1.2112 loss_at_student: 1.2112 max mem: 7885 Epoch: 2 [ 8592 / 111704] loss: 1.4838 loss_at_student: 1.4838 max mem: 7885 Epoch: 2 [ 9092 / 111704] loss: 1.4352 loss_at_student: 1.4352 max mem: 7885 Epoch: 2 [ 9592 / 111704] loss: 1.3633 loss_at_student: 1.3633 max mem: 7885 Epoch: 2 [ 10092 / 111704] loss: 1.1888 loss_at_student: 1.1888 max mem: 7885 Epoch: 2 [ 10592 / 111704] loss: 1.5132 loss_at_student: 1.5132 max mem: 7885 Epoch: 2 [ 11092 / 111704] loss: 1.5450 loss_at_student: 1.5450 max mem: 7885 Epoch: 2 [ 11592 / 111704] loss: 1.4293 loss_at_student: 1.4293 max mem: 7885 Epoch: 2 [ 12092 / 111704] loss: 1.2772 loss_at_student: 1.2772 max mem: 7885 Epoch: 2 [ 12592 / 111704] loss: 1.6412 loss_at_student: 1.6412 max mem: 7885 Epoch: 2 [ 13092 / 111704] loss: 1.0101 loss_at_student: 1.0101 max mem: 7885 Epoch: 2 [ 13592 / 111704] loss: 1.3436 loss_at_student: 1.3436 max mem: 7885 Epoch: 2 [ 14092 / 111704] loss: 1.3178 loss_at_student: 1.3178 max mem: 7885 Epoch: 2 [ 14592 / 111704] loss: 1.5054 loss_at_student: 1.5054 max mem: 7885 Epoch: 2 [ 15092 / 111704] loss: 1.1839 loss_at_student: 1.1839 max mem: 7885 Epoch: 2 [ 15592 / 111704] loss: 1.6556 loss_at_student: 1.6556 max mem: 7885 Epoch: 2 [ 16092 / 111704] loss: 1.2037 loss_at_student: 1.2037 max mem: 7885 Epoch: 2 [ 16592 / 111704] loss: 1.5082 loss_at_student: 1.5082 max mem: 7885 Epoch: 2 [ 17092 / 111704] loss: 1.4019 loss_at_student: 1.4019 max mem: 7885 Epoch: 2 [ 17592 / 111704] loss: 1.3759 loss_at_student: 1.3759 max mem: 7885 Epoch: 2 [ 18092 / 111704] loss: 1.3439 loss_at_student: 1.3439 max mem: 7885 Epoch: 2 [ 18592 / 111704] loss: 1.6156 loss_at_student: 1.6156 max mem: 7885 Epoch: 2 [ 19092 / 111704] loss: 1.1424 loss_at_student: 1.1424 max mem: 7885 Epoch: 2 [ 19592 / 111704] loss: 1.6736 loss_at_student: 1.6736 max mem: 7885 Epoch: 2 [ 20092 / 111704] loss: 1.4349 loss_at_student: 1.4349 max mem: 7885 Epoch: 2 [ 20592 / 111704] loss: 1.5039 loss_at_student: 1.5039 max mem: 7885 Epoch: 2 [ 21092 / 111704] loss: 1.7404 loss_at_student: 1.7404 max mem: 7885 Epoch: 2 [ 21592 / 111704] loss: 1.0968 loss_at_student: 1.0968 max mem: 7885 Epoch: 2 [ 22092 / 111704] loss: 1.1918 loss_at_student: 1.1918 max mem: 7885 Epoch: 2 [ 22592 / 111704] loss: 1.4414 loss_at_student: 1.4414 max mem: 7885 Epoch: 2 [ 23092 / 111704] loss: 1.3244 loss_at_student: 1.3244 max mem: 7885 Epoch: 2 [ 23592 / 111704] loss: 1.4269 loss_at_student: 1.4269 max mem: 7885 Epoch: 2 [ 24092 / 111704] loss: 1.1858 loss_at_student: 1.1858 max mem: 7885 Epoch: 2 [ 24592 / 111704] loss: 1.6356 loss_at_student: 1.6356 max mem: 7885 Epoch: 2 [ 25092 / 111704] loss: 1.3557 loss_at_student: 1.3557 max mem: 7885 Epoch: 2 [ 25592 / 111704] loss: 1.3701 loss_at_student: 1.3701 max mem: 7885 Epoch: 2 [ 26092 / 111704] loss: 1.3868 loss_at_student: 1.3868 max mem: 7885 Epoch: 2 [ 26592 / 111704] loss: 1.2932 loss_at_student: 1.2932 max mem: 7885 Epoch: 2 [ 27092 / 111704] loss: 1.4632 loss_at_student: 1.4632 max mem: 7885 Epoch: 2 [ 27592 / 111704] loss: 1.2855 loss_at_student: 1.2855 max mem: 7885 Epoch: 2 [ 28092 / 111704] loss: 1.5383 loss_at_student: 1.5383 max mem: 7885 Epoch: 2 [ 28592 / 111704] loss: 1.4850 loss_at_student: 1.4850 max mem: 7885 Epoch: 2 [ 29092 / 111704] loss: 1.1254 loss_at_student: 1.1254 max mem: 7885 Epoch: 2 [ 29592 / 111704] loss: 1.6584 loss_at_student: 1.6584 max mem: 7885 Epoch: 2 [ 30092 / 111704] loss: 1.4810 loss_at_student: 1.4810 max mem: 7885 Epoch: 2 [ 30592 / 111704] loss: 1.4971 loss_at_student: 1.4971 max mem: 7885 Epoch: 2 [ 31092 / 111704] loss: 1.5722 loss_at_student: 1.5722 max mem: 7885 Epoch: 2 [ 31592 / 111704] loss: 1.3945 loss_at_student: 1.3945 max mem: 7885 Epoch: 2 [ 32092 / 111704] loss: 1.3151 loss_at_student: 1.3151 max mem: 7885 Epoch: 2 [ 32592 / 111704] loss: 1.2023 loss_at_student: 1.2023 max mem: 7885 Epoch: 2 [ 33092 / 111704] loss: 1.0153 loss_at_student: 1.0153 max mem: 7885 Epoch: 2 [ 33592 / 111704] loss: 1.3770 loss_at_student: 1.3770 max mem: 7885 Epoch: 2 [ 34092 / 111704] loss: 1.3195 loss_at_student: 1.3195 max mem: 7885 Epoch: 2 [ 34592 / 111704] loss: 1.3372 loss_at_student: 1.3372 max mem: 7885 Epoch: 2 [ 35092 / 111704] loss: 1.4390 loss_at_student: 1.4390 max mem: 7885 Epoch: 2 [ 35592 / 111704] loss: 1.2944 loss_at_student: 1.2944 max mem: 7885 Epoch: 2 [ 36092 / 111704] loss: 1.3752 loss_at_student: 1.3752 max mem: 7885 Epoch: 2 [ 36592 / 111704] loss: 1.2923 loss_at_student: 1.2923 max mem: 7885 Epoch: 2 [ 37092 / 111704] loss: 1.5220 loss_at_student: 1.5220 max mem: 7885 Epoch: 2 [ 37592 / 111704] loss: 1.1468 loss_at_student: 1.1468 max mem: 7885 Epoch: 2 [ 38092 / 111704] loss: 1.3883 loss_at_student: 1.3883 max mem: 7885 Epoch: 2 [ 38592 / 111704] loss: 1.4976 loss_at_student: 1.4976 max mem: 7885 Epoch: 2 [ 39092 / 111704] loss: 1.1786 loss_at_student: 1.1786 max mem: 7885 Epoch: 2 [ 39592 / 111704] loss: 1.3614 loss_at_student: 1.3614 max mem: 7885 Epoch: 2 [ 40092 / 111704] loss: 1.4200 loss_at_student: 1.4200 max mem: 7885 Epoch: 2 [ 40592 / 111704] loss: 1.4272 loss_at_student: 1.4272 max mem: 7885 Epoch: 2 [ 41092 / 111704] loss: 1.3598 loss_at_student: 1.3598 max mem: 7885 Epoch: 2 [ 41592 / 111704] loss: 1.5351 loss_at_student: 1.5351 max mem: 7885 Epoch: 2 [ 42092 / 111704] loss: 1.1653 loss_at_student: 1.1653 max mem: 7885 Epoch: 2 [ 42592 / 111704] loss: 1.3799 loss_at_student: 1.3799 max mem: 7885 Epoch: 2 [ 43092 / 111704] loss: 1.0620 loss_at_student: 1.0620 max mem: 7885 Epoch: 2 [ 43592 / 111704] loss: 1.4741 loss_at_student: 1.4741 max mem: 7885 Epoch: 2 [ 44092 / 111704] loss: 1.4677 loss_at_student: 1.4677 max mem: 7885 Epoch: 2 [ 44592 / 111704] loss: 1.6032 loss_at_student: 1.6032 max mem: 7885 Epoch: 2 [ 45092 / 111704] loss: 1.3592 loss_at_student: 1.3592 max mem: 7885 Epoch: 2 [ 45592 / 111704] loss: 1.3748 loss_at_student: 1.3748 max mem: 7885 Epoch: 2 [ 46092 / 111704] loss: 1.1872 loss_at_student: 1.1872 max mem: 7885 Epoch: 2 [ 46592 / 111704] loss: 1.6745 loss_at_student: 1.6745 max mem: 7885 Epoch: 2 [ 47092 / 111704] loss: 1.3234 loss_at_student: 1.3234 max mem: 7885 Epoch: 2 [ 47592 / 111704] loss: 1.7143 loss_at_student: 1.7143 max mem: 7885 Epoch: 2 [ 48092 / 111704] loss: 1.2710 loss_at_student: 1.2710 max mem: 7885 Epoch: 2 [ 48592 / 111704] loss: 1.3994 loss_at_student: 1.3994 max mem: 7885 Epoch: 2 [ 49092 / 111704] loss: 1.7256 loss_at_student: 1.7256 max mem: 7885 Epoch: 2 [ 49592 / 111704] loss: 1.3446 loss_at_student: 1.3446 max mem: 7885 Epoch: 2 [ 50092 / 111704] loss: 1.4180 loss_at_student: 1.4180 max mem: 7885 Epoch: 2 [ 50592 / 111704] loss: 1.0719 loss_at_student: 1.0719 max mem: 7885 Epoch: 2 [ 51092 / 111704] loss: 1.4062 loss_at_student: 1.4062 max mem: 7885 Epoch: 2 [ 51592 / 111704] loss: 1.5881 loss_at_student: 1.5881 max mem: 7885 Epoch: 2 [ 52092 / 111704] loss: 1.3286 loss_at_student: 1.3286 max mem: 7885 Epoch: 2 [ 52592 / 111704] loss: 1.6040 loss_at_student: 1.6040 max mem: 7885 Epoch: 2 [ 53092 / 111704] loss: 1.2944 loss_at_student: 1.2944 max mem: 7885 Epoch: 2 [ 53592 / 111704] loss: 1.2210 loss_at_student: 1.2210 max mem: 7885 Epoch: 2 [ 54092 / 111704] loss: 1.3470 loss_at_student: 1.3470 max mem: 7885 Epoch: 2 [ 54592 / 111704] loss: 1.3174 loss_at_student: 1.3174 max mem: 7885 Epoch: 2 [ 55092 / 111704] loss: 1.2321 loss_at_student: 1.2321 max mem: 7885 Epoch: 2 [ 55592 / 111704] loss: 1.2629 loss_at_student: 1.2629 max mem: 7885 Epoch: 2 [ 56092 / 111704] loss: 1.2638 loss_at_student: 1.2638 max mem: 7885 Epoch: 2 [ 56592 / 111704] loss: 1.3526 loss_at_student: 1.3526 max mem: 7885 Epoch: 2 [ 57092 / 111704] loss: 1.1926 loss_at_student: 1.1926 max mem: 7885 Epoch: 2 [ 57592 / 111704] loss: 1.4457 loss_at_student: 1.4457 max mem: 7885 Epoch: 2 [ 58092 / 111704] loss: 1.1414 loss_at_student: 1.1414 max mem: 7885 Epoch: 2 [ 58592 / 111704] loss: 1.6264 loss_at_student: 1.6264 max mem: 7885 Epoch: 2 [ 59092 / 111704] loss: 1.2854 loss_at_student: 1.2854 max mem: 7885 Epoch: 2 [ 59592 / 111704] loss: 1.4883 loss_at_student: 1.4883 max mem: 7885 Epoch: 2 [ 60092 / 111704] loss: 1.4095 loss_at_student: 1.4095 max mem: 7885 Epoch: 2 [ 60592 / 111704] loss: 1.5695 loss_at_student: 1.5695 max mem: 7885 Epoch: 2 [ 61092 / 111704] loss: 1.5068 loss_at_student: 1.5068 max mem: 7885 Epoch: 2 [ 61592 / 111704] loss: 1.6300 loss_at_student: 1.6300 max mem: 7885 Epoch: 2 [ 62092 / 111704] loss: 1.0883 loss_at_student: 1.0883 max mem: 7885 Epoch: 2 [ 62592 / 111704] loss: 1.2440 loss_at_student: 1.2440 max mem: 7885 Epoch: 2 [ 63092 / 111704] loss: 1.3880 loss_at_student: 1.3880 max mem: 7885 Epoch: 2 [ 63592 / 111704] loss: 1.4643 loss_at_student: 1.4643 max mem: 7885 Epoch: 2 [ 64092 / 111704] loss: 1.2704 loss_at_student: 1.2704 max mem: 7885 Epoch: 2 [ 64592 / 111704] loss: 1.5147 loss_at_student: 1.5147 max mem: 7885 Epoch: 2 [ 65092 / 111704] loss: 1.1173 loss_at_student: 1.1173 max mem: 7885 Epoch: 2 [ 65592 / 111704] loss: 1.2575 loss_at_student: 1.2575 max mem: 7885 Epoch: 2 [ 66092 / 111704] loss: 1.1828 loss_at_student: 1.1828 max mem: 7885 Epoch: 2 [ 66592 / 111704] loss: 1.1572 loss_at_student: 1.1572 max mem: 7885 Epoch: 2 [ 67092 / 111704] loss: 1.6350 loss_at_student: 1.6350 max mem: 7885 Epoch: 2 [ 67592 / 111704] loss: 1.3798 loss_at_student: 1.3798 max mem: 7885 Epoch: 2 [ 68092 / 111704] loss: 1.4497 loss_at_student: 1.4497 max mem: 7885 Epoch: 2 [ 68592 / 111704] loss: 1.2884 loss_at_student: 1.2884 max mem: 7885 Epoch: 2 [ 69092 / 111704] loss: 1.5228 loss_at_student: 1.5228 max mem: 7885 Epoch: 2 [ 69592 / 111704] loss: 1.4353 loss_at_student: 1.4353 max mem: 7885 Epoch: 2 [ 70092 / 111704] loss: 1.4263 loss_at_student: 1.4263 max mem: 7885 Epoch: 2 [ 70592 / 111704] loss: 1.4143 loss_at_student: 1.4143 max mem: 7885 Epoch: 2 [ 71092 / 111704] loss: 1.2121 loss_at_student: 1.2121 max mem: 7885 Epoch: 2 [ 71592 / 111704] loss: 1.5210 loss_at_student: 1.5210 max mem: 7885 Epoch: 2 [ 72092 / 111704] loss: 1.3068 loss_at_student: 1.3068 max mem: 7885 Epoch: 2 [ 72592 / 111704] loss: 1.4361 loss_at_student: 1.4361 max mem: 7885 Epoch: 2 [ 73092 / 111704] loss: 1.0697 loss_at_student: 1.0697 max mem: 7885 Epoch: 2 [ 73592 / 111704] loss: 1.5055 loss_at_student: 1.5055 max mem: 7885 Epoch: 2 [ 74092 / 111704] loss: 1.3399 loss_at_student: 1.3399 max mem: 7885 Epoch: 2 [ 74592 / 111704] loss: 1.2531 loss_at_student: 1.2531 max mem: 7885 Epoch: 2 [ 75092 / 111704] loss: 0.8334 loss_at_student: 0.8334 max mem: 7885 Epoch: 2 [ 75592 / 111704] loss: 1.6065 loss_at_student: 1.6065 max mem: 7885 Epoch: 2 [ 76092 / 111704] loss: 1.4578 loss_at_student: 1.4578 max mem: 7885 Epoch: 2 [ 76592 / 111704] loss: 1.4140 loss_at_student: 1.4140 max mem: 7885 Epoch: 2 [ 77092 / 111704] loss: 1.4089 loss_at_student: 1.4089 max mem: 7885 Epoch: 2 [ 77592 / 111704] loss: 1.4560 loss_at_student: 1.4560 max mem: 7885 Epoch: 2 [ 78092 / 111704] loss: 1.2775 loss_at_student: 1.2775 max mem: 7885 Epoch: 2 [ 78592 / 111704] loss: 1.2154 loss_at_student: 1.2154 max mem: 7885 Epoch: 2 [ 79092 / 111704] loss: 1.0304 loss_at_student: 1.0304 max mem: 7885 Epoch: 2 [ 79592 / 111704] loss: 1.3747 loss_at_student: 1.3747 max mem: 7885 Epoch: 2 [ 80092 / 111704] loss: 1.3985 loss_at_student: 1.3985 max mem: 7885 Epoch: 2 [ 80592 / 111704] loss: 1.3049 loss_at_student: 1.3049 max mem: 7885 Epoch: 2 [ 81092 / 111704] loss: 1.4076 loss_at_student: 1.4076 max mem: 7885 Epoch: 2 [ 81592 / 111704] loss: 1.4567 loss_at_student: 1.4567 max mem: 7885 Epoch: 2 [ 82092 / 111704] loss: 1.1410 loss_at_student: 1.1410 max mem: 7885 Epoch: 2 [ 82592 / 111704] loss: 1.0147 loss_at_student: 1.0147 max mem: 7885 Epoch: 2 [ 83092 / 111704] loss: 1.1666 loss_at_student: 1.1666 max mem: 7885 Epoch: 2 [ 83592 / 111704] loss: 1.1989 loss_at_student: 1.1989 max mem: 7885 Epoch: 2 [ 84092 / 111704] loss: 1.2449 loss_at_student: 1.2449 max mem: 7885 Epoch: 2 [ 84592 / 111704] loss: 1.2657 loss_at_student: 1.2657 max mem: 7885 Epoch: 2 [ 85092 / 111704] loss: 1.4767 loss_at_student: 1.4767 max mem: 7885 Epoch: 2 [ 85592 / 111704] loss: 1.5114 loss_at_student: 1.5114 max mem: 7885 Epoch: 2 [ 86092 / 111704] loss: 1.2506 loss_at_student: 1.2506 max mem: 7885 Epoch: 2 [ 86592 / 111704] loss: 1.5702 loss_at_student: 1.5702 max mem: 7885 Epoch: 2 [ 87092 / 111704] loss: 1.1429 loss_at_student: 1.1429 max mem: 7885 Epoch: 2 [ 87592 / 111704] loss: 1.3348 loss_at_student: 1.3348 max mem: 7885 Epoch: 2 [ 88092 / 111704] loss: 1.1995 loss_at_student: 1.1995 max mem: 7885 Epoch: 2 [ 88592 / 111704] loss: 1.3872 loss_at_student: 1.3872 max mem: 7885 Epoch: 2 [ 89092 / 111704] loss: 1.2124 loss_at_student: 1.2124 max mem: 7885 Epoch: 2 [ 89592 / 111704] loss: 1.5326 loss_at_student: 1.5326 max mem: 7885 Epoch: 2 [ 90092 / 111704] loss: 1.3498 loss_at_student: 1.3498 max mem: 7885 Epoch: 2 [ 90592 / 111704] loss: 1.4594 loss_at_student: 1.4594 max mem: 7885 Epoch: 2 [ 91092 / 111704] loss: 1.3599 loss_at_student: 1.3599 max mem: 7885 Epoch: 2 [ 91592 / 111704] loss: 1.1560 loss_at_student: 1.1560 max mem: 7885 Epoch: 2 [ 92092 / 111704] loss: 1.2257 loss_at_student: 1.2257 max mem: 7885 Epoch: 2 [ 92592 / 111704] loss: 1.3833 loss_at_student: 1.3833 max mem: 7885 Epoch: 2 [ 93092 / 111704] loss: 1.1509 loss_at_student: 1.1509 max mem: 7885 Epoch: 2 [ 93592 / 111704] loss: 1.2436 loss_at_student: 1.2436 max mem: 7885 Epoch: 2 [ 94092 / 111704] loss: 1.2453 loss_at_student: 1.2453 max mem: 7885 Epoch: 2 [ 94592 / 111704] loss: 1.5166 loss_at_student: 1.5166 max mem: 7885 Epoch: 2 [ 95092 / 111704] loss: 1.3175 loss_at_student: 1.3175 max mem: 7885 Epoch: 2 [ 95592 / 111704] loss: 1.6542 loss_at_student: 1.6542 max mem: 7885 Epoch: 2 [ 96092 / 111704] loss: 1.2026 loss_at_student: 1.2026 max mem: 7885 Epoch: 2 [ 96592 / 111704] loss: 0.9813 loss_at_student: 0.9813 max mem: 7885 Epoch: 2 [ 97092 / 111704] loss: 1.4833 loss_at_student: 1.4833 max mem: 7885 Epoch: 2 [ 97592 / 111704] loss: 1.3172 loss_at_student: 1.3172 max mem: 7885 Epoch: 2 [ 98092 / 111704] loss: 1.1913 loss_at_student: 1.1913 max mem: 7885 Epoch: 2 [ 98592 / 111704] loss: 1.2214 loss_at_student: 1.2214 max mem: 7885 Epoch: 2 [ 99092 / 111704] loss: 1.4009 loss_at_student: 1.4009 max mem: 7885 Epoch: 2 [ 99592 / 111704] loss: 1.2742 loss_at_student: 1.2742 max mem: 7885 Epoch: 2 [100092 / 111704] loss: 1.3582 loss_at_student: 1.3582 max mem: 7885 Epoch: 2 [100592 / 111704] loss: 1.2499 loss_at_student: 1.2499 max mem: 7885 Epoch: 2 [101092 / 111704] loss: 1.0386 loss_at_student: 1.0386 max mem: 7885 Epoch: 2 [101592 / 111704] loss: 1.3682 loss_at_student: 1.3682 max mem: 7885 Epoch: 2 [102092 / 111704] loss: 1.2101 loss_at_student: 1.2101 max mem: 7885 Epoch: 2 [102592 / 111704] loss: 1.4907 loss_at_student: 1.4907 max mem: 7885 Epoch: 2 [103092 / 111704] loss: 1.1144 loss_at_student: 1.1144 max mem: 7885 Epoch: 2 [103592 / 111704] loss: 1.3349 loss_at_student: 1.3349 max mem: 7885 Epoch: 2 [104092 / 111704] loss: 1.3147 loss_at_student: 1.3147 max mem: 7885 Epoch: 2 [104592 / 111704] loss: 1.0516 loss_at_student: 1.0516 max mem: 7885 Epoch: 2 [105092 / 111704] loss: 1.0690 loss_at_student: 1.0690 max mem: 7885 Epoch: 2 [105592 / 111704] loss: 1.4174 loss_at_student: 1.4174 max mem: 7885 Epoch: 2 [106092 / 111704] loss: 1.4570 loss_at_student: 1.4570 max mem: 7885 Epoch: 2 [106592 / 111704] loss: 1.3727 loss_at_student: 1.3727 max mem: 7885 Epoch: 2 [107092 / 111704] loss: 1.1224 loss_at_student: 1.1224 max mem: 7885 Epoch: 2 [107592 / 111704] loss: 1.1112 loss_at_student: 1.1112 max mem: 7885 Epoch: 2 [108092 / 111704] loss: 1.5306 loss_at_student: 1.5306 max mem: 7885 Epoch: 2 [108592 / 111704] loss: 1.0281 loss_at_student: 1.0281 max mem: 7885 Epoch: 2 [109092 / 111704] loss: 1.3664 loss_at_student: 1.3664 max mem: 7885 Epoch: 2 [109592 / 111704] loss: 1.1933 loss_at_student: 1.1933 max mem: 7885 Epoch: 2 [110092 / 111704] loss: 1.3259 loss_at_student: 1.3259 max mem: 7885 Epoch: 2 [110592 / 111704] loss: 1.3943 loss_at_student: 1.3943 max mem: 7885 Epoch: 2 [111092 / 111704] loss: 1.2282 loss_at_student: 1.2282 max mem: 7885 Epoch: 2 [111592 / 111704] loss: 1.3776 loss_at_student: 1.3776 max mem: 7885 Averaged stats: loss: 1.3411 loss_at_student: 1.3411 Train epoch time: 4:11:44 Train time: 12:50:12