2024-04-25 23:49:16,511 INFO StreamThr :212584 [internal.py:wandb_internal():86] W&B internal server running at pid: 212584, started at: 2024-04-25 23:49:16.510614 2024-04-25 23:49:16,512 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: status 2024-04-25 23:49:16,519 INFO WriterThread:212584 [datastore.py:open_for_write():87] open: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/run-ozdw63qu.wandb 2024-04-25 23:49:16,520 DEBUG SenderThread:212584 [sender.py:send():379] send: header 2024-04-25 23:49:16,537 DEBUG SenderThread:212584 [sender.py:send():379] send: run 2024-04-25 23:49:16,772 INFO SenderThread:212584 [dir_watcher.py:__init__():211] watching files in: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files 2024-04-25 23:49:16,772 INFO SenderThread:212584 [sender.py:_start_run_threads():1124] run started: ozdw63qu with start time 1714088956.516191 2024-04-25 23:49:16,778 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: check_version 2024-04-25 23:49:16,778 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: check_version 2024-04-25 23:49:16,833 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: run_start 2024-04-25 23:49:16,892 DEBUG HandlerThread:212584 [system_info.py:__init__():26] System info init 2024-04-25 23:49:16,892 DEBUG HandlerThread:212584 [system_info.py:__init__():41] System info init done 2024-04-25 23:49:16,892 INFO HandlerThread:212584 [system_monitor.py:start():194] Starting system monitor 2024-04-25 23:49:16,893 INFO SystemMonitor:212584 [system_monitor.py:_start():158] Starting system asset monitoring threads 2024-04-25 23:49:16,893 INFO HandlerThread:212584 [system_monitor.py:probe():214] Collecting system info 2024-04-25 23:49:16,893 INFO SystemMonitor:212584 [interfaces.py:start():190] Started cpu monitoring 2024-04-25 23:49:16,893 INFO SystemMonitor:212584 [interfaces.py:start():190] Started disk monitoring 2024-04-25 23:49:16,894 INFO SystemMonitor:212584 [interfaces.py:start():190] Started gpu monitoring 2024-04-25 23:49:16,894 INFO SystemMonitor:212584 [interfaces.py:start():190] Started memory monitoring 2024-04-25 23:49:16,895 INFO SystemMonitor:212584 [interfaces.py:start():190] Started network monitoring 2024-04-25 23:49:16,939 DEBUG HandlerThread:212584 [system_info.py:probe():150] Probing system 2024-04-25 23:49:16,942 DEBUG HandlerThread:212584 [system_info.py:_probe_git():135] Probing git 2024-04-25 23:49:16,961 DEBUG HandlerThread:212584 [system_info.py:_probe_git():143] Probing git done 2024-04-25 23:49:16,961 DEBUG HandlerThread:212584 [system_info.py:probe():198] Probing system done 2024-04-25 23:49:16,961 DEBUG HandlerThread:212584 [system_monitor.py:probe():223] {'os': 'Linux-5.15.0-1048-aws-x86_64-with-glibc2.31', 'python': '3.11.9', 'heartbeatAt': '2024-04-25T23:49:16.939535', 'startedAt': '2024-04-25T23:49:16.496864', 'docker': None, 'cuda': None, 'args': ('./config_full.yaml',), 'state': 'running', 'program': '/fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/./run_sft.py', 'codePathLocal': 'run_sft.py', 'codePath': 'run_sft.py', 'git': {'remote': 'https://huggingface.co/sanchit-gandhi/distil-zephyr-1.5b-ssft-ultrachat', 'commit': 'cbea69c6b95c970317a1e47c3f614b55b33f8ed9'}, 'email': None, 'root': '/fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat', 'host': 'ip-26-0-167-177', 'username': 'sanchit', 'executable': '/fsx/sanchit/miniconda3/envs/alignment/bin/python', 'cpu_count': 96, 'cpu_count_logical': 96, 'cpu_freq': {'current': 2718.7540416666648, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 2603.523, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3596.748, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3596.723, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3597.872, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3590.698, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2688.441, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3582.227, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3596.99, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3597.34, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 290.7472343444824, 'used': 58.58917236328125}}, 'gpu': 'NVIDIA H100 80GB HBM3', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}], 'memory': {'total': 1999.9855155944824}} 2024-04-25 23:49:16,962 INFO HandlerThread:212584 [system_monitor.py:probe():224] Finished collecting system info 2024-04-25 23:49:16,962 INFO HandlerThread:212584 [system_monitor.py:probe():227] Publishing system info 2024-04-25 23:49:16,962 DEBUG HandlerThread:212584 [system_info.py:_save_conda():207] Saving list of conda packages installed into the current environment 2024-04-25 23:49:17,774 INFO Thread-12 :212584 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/conda-environment.yaml 2024-04-25 23:49:20,884 DEBUG HandlerThread:212584 [system_info.py:_save_conda():222] Saving conda packages done 2024-04-25 23:49:20,887 INFO HandlerThread:212584 [system_monitor.py:probe():229] Finished publishing system info 2024-04-25 23:49:20,915 DEBUG SenderThread:212584 [sender.py:send():379] send: files 2024-04-25 23:49:20,915 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-metadata.json with policy now 2024-04-25 23:49:21,061 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: python_packages 2024-04-25 23:49:21,061 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: python_packages 2024-04-25 23:49:21,061 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: internal_messages 2024-04-25 23:49:21,062 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: stop_status 2024-04-25 23:49:21,063 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: stop_status 2024-04-25 23:49:21,184 INFO wandb-upload_0:212584 [upload_job.py:push():131] Uploaded file /tmp/tmp9wuipw18wandb/r4fodxap-wandb-metadata.json 2024-04-25 23:49:21,185 DEBUG SenderThread:212584 [sender.py:send():379] send: telemetry 2024-04-25 23:49:21,185 DEBUG SenderThread:212584 [sender.py:send():379] send: config 2024-04-25 23:49:21,187 DEBUG SenderThread:212584 [sender.py:send():379] send: metric 2024-04-25 23:49:21,188 DEBUG SenderThread:212584 [sender.py:send():379] send: telemetry 2024-04-25 23:49:21,188 DEBUG SenderThread:212584 [sender.py:send():379] send: metric 2024-04-25 23:49:21,188 WARNING SenderThread:212584 [sender.py:send_metric():1341] Seen metric with glob (shouldn't happen) 2024-04-25 23:49:21,188 DEBUG SenderThread:212584 [sender.py:send():379] send: telemetry 2024-04-25 23:49:21,778 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/conda-environment.yaml 2024-04-25 23:49:21,778 INFO Thread-12 :212584 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/wandb-metadata.json 2024-04-25 23:49:21,778 INFO Thread-12 :212584 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/requirements.txt 2024-04-25 23:49:21,778 INFO Thread-12 :212584 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:22,189 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: status_report 2024-04-25 23:49:23,780 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:24,232 DEBUG SenderThread:212584 [sender.py:send():379] send: telemetry 2024-04-25 23:49:24,233 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: summary_record 2024-04-25 23:49:24,234 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: partial_history 2024-04-25 23:49:24,236 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end 2024-04-25 23:49:24,237 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: summary_record 2024-04-25 23:49:24,238 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end 2024-04-25 23:49:24,238 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: summary_record 2024-04-25 23:49:24,239 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end 2024-04-25 23:49:24,239 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: summary_record 2024-04-25 23:49:24,240 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end 2024-04-25 23:49:24,240 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: summary_record 2024-04-25 23:49:24,242 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end 2024-04-25 23:49:24,242 DEBUG SenderThread:212584 [sender.py:send():379] send: metric 2024-04-25 23:49:24,242 DEBUG SenderThread:212584 [sender.py:send():379] send: history 2024-04-25 23:49:24,242 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: summary_record 2024-04-25 23:49:24,243 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end 2024-04-25 23:49:24,782 INFO Thread-12 :212584 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/wandb-summary.json 2024-04-25 23:49:25,754 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: partial_history 2024-04-25 23:49:25,757 DEBUG SenderThread:212584 [sender.py:send():379] send: metric 2024-04-25 23:49:25,758 DEBUG SenderThread:212584 [sender.py:send():379] send: metric 2024-04-25 23:49:25,758 DEBUG SenderThread:212584 [sender.py:send():379] send: metric 2024-04-25 23:49:25,758 DEBUG SenderThread:212584 [sender.py:send():379] send: metric 2024-04-25 23:49:25,758 DEBUG SenderThread:212584 [sender.py:send():379] send: history 2024-04-25 23:49:25,758 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: summary_record 2024-04-25 23:49:25,760 INFO SenderThread:212584 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end 2024-04-25 23:49:25,783 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/wandb-summary.json 2024-04-25 23:49:25,784 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:26,785 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:27,764 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: status_report 2024-04-25 23:49:27,787 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:31,791 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:33,688 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: status_report 2024-04-25 23:49:36,062 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: internal_messages 2024-04-25 23:49:36,063 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: stop_status 2024-04-25 23:49:36,063 DEBUG SenderThread:212584 [sender.py:send_request():406] send_request: stop_status 2024-04-25 23:49:37,798 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:39,684 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: status_report 2024-04-25 23:49:39,800 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:40,801 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:41,802 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:42,804 INFO Thread-12 :212584 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_234916-ozdw63qu/files/output.log 2024-04-25 23:49:45,208 DEBUG HandlerThread:212584 [handler.py:handle_request():146] handle_request: status_report