sanchit-gandhi's picture
Training in progress, step 500
4ea2eae verified
raw
history blame
31.8 kB
2024-04-24 15:43:39,468 INFO StreamThr :1848599 [internal.py:wandb_internal():86] W&B internal server running at pid: 1848599, started at: 2024-04-24 15:43:39.467078
2024-04-24 15:43:39,469 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: status
2024-04-24 15:43:39,473 INFO WriterThread:1848599 [datastore.py:open_for_write():85] open: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/run-mwp0iutr.wandb
2024-04-24 15:43:39,476 DEBUG SenderThread:1848599 [sender.py:send():382] send: header
2024-04-24 15:43:39,521 DEBUG SenderThread:1848599 [sender.py:send():382] send: run
2024-04-24 15:43:39,793 INFO SenderThread:1848599 [dir_watcher.py:__init__():211] watching files in: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files
2024-04-24 15:43:39,793 INFO SenderThread:1848599 [sender.py:_start_run_threads():1136] run started: mwp0iutr with start time 1713973419.470656
2024-04-24 15:43:39,798 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: check_version
2024-04-24 15:43:39,799 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: check_version
2024-04-24 15:43:39,851 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: run_start
2024-04-24 15:43:39,908 DEBUG HandlerThread:1848599 [system_info.py:__init__():32] System info init
2024-04-24 15:43:39,908 DEBUG HandlerThread:1848599 [system_info.py:__init__():47] System info init done
2024-04-24 15:43:39,908 INFO HandlerThread:1848599 [system_monitor.py:start():194] Starting system monitor
2024-04-24 15:43:39,908 INFO SystemMonitor:1848599 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-04-24 15:43:39,909 INFO HandlerThread:1848599 [system_monitor.py:probe():214] Collecting system info
2024-04-24 15:43:39,909 INFO SystemMonitor:1848599 [interfaces.py:start():190] Started cpu monitoring
2024-04-24 15:43:39,909 INFO SystemMonitor:1848599 [interfaces.py:start():190] Started disk monitoring
2024-04-24 15:43:39,910 INFO SystemMonitor:1848599 [interfaces.py:start():190] Started gpu monitoring
2024-04-24 15:43:39,911 INFO SystemMonitor:1848599 [interfaces.py:start():190] Started memory monitoring
2024-04-24 15:43:39,911 INFO SystemMonitor:1848599 [interfaces.py:start():190] Started network monitoring
2024-04-24 15:43:39,965 DEBUG HandlerThread:1848599 [system_info.py:probe():196] Probing system
2024-04-24 15:43:39,967 DEBUG HandlerThread:1848599 [system_info.py:_probe_git():181] Probing git
2024-04-24 15:43:39,987 DEBUG HandlerThread:1848599 [system_info.py:_probe_git():189] Probing git done
2024-04-24 15:43:39,987 DEBUG HandlerThread:1848599 [system_info.py:probe():244] Probing system done
2024-04-24 15:43:39,987 DEBUG HandlerThread:1848599 [system_monitor.py:probe():223] {'os': 'Linux-5.15.0-1048-aws-x86_64-with-glibc2.31', 'python': '3.11.5', 'heartbeatAt': '2024-04-24T15:43:39.965097', 'startedAt': '2024-04-24T15:43:39.449266', 'docker': None, 'cuda': None, 'args': ('config_full.yaml',), 'state': 'running', 'program': '/fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/run_sft.py', 'codePathLocal': 'run_sft.py', 'codePath': 'run_sft.py', 'git': {'remote': 'https://huggingface.co/sanchit-gandhi/distil-zephyr-1.5b-ssft-ultrachat', 'commit': 'cbea69c6b95c970317a1e47c3f614b55b33f8ed9'}, 'email': None, 'root': '/fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat', 'host': 'ip-26-0-162-233', 'username': 'sanchit', 'executable': '/fsx/sanchit/miniconda3/envs/venv/bin/python', 'cpu_count': 96, 'cpu_count_logical': 96, 'cpu_freq': {'current': 2721.9698645833337, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 3590.538, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 3595.996, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 3597.59, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 3399.936, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 3598.273, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 3597.284, 'min': 0.0, 'max': 0.0}, {'current': 3036.337, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 3597.887, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 3598.442, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}, {'current': 2650.0, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 290.7472343444824, 'used': 59.263893127441406}}, 'gpu': 'NVIDIA H100 80GB HBM3', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}], 'memory': {'total': 1999.9855270385742}}
2024-04-24 15:43:39,988 INFO HandlerThread:1848599 [system_monitor.py:probe():224] Finished collecting system info
2024-04-24 15:43:39,988 INFO HandlerThread:1848599 [system_monitor.py:probe():227] Publishing system info
2024-04-24 15:43:39,988 DEBUG HandlerThread:1848599 [system_info.py:_save_pip():52] Saving list of pip packages installed into the current environment
2024-04-24 15:43:39,989 DEBUG HandlerThread:1848599 [system_info.py:_save_pip():68] Saving pip packages done
2024-04-24 15:43:39,990 DEBUG HandlerThread:1848599 [system_info.py:_save_conda():75] Saving list of conda packages installed into the current environment
2024-04-24 15:43:40,795 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/conda-environment.yaml
2024-04-24 15:43:40,796 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/requirements.txt
2024-04-24 15:43:45,799 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/conda-environment.yaml
2024-04-24 15:43:45,805 DEBUG HandlerThread:1848599 [system_info.py:_save_conda():87] Saving conda packages done
2024-04-24 15:43:45,807 INFO HandlerThread:1848599 [system_monitor.py:probe():229] Finished publishing system info
2024-04-24 15:43:45,857 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: status_report
2024-04-24 15:43:45,857 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: keepalive
2024-04-24 15:43:45,858 DEBUG SenderThread:1848599 [sender.py:send():382] send: files
2024-04-24 15:43:45,858 INFO SenderThread:1848599 [sender.py:_save_file():1392] saving file wandb-metadata.json with policy now
2024-04-24 15:43:45,864 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: stop_status
2024-04-24 15:43:45,865 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: stop_status
2024-04-24 15:43:45,867 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: internal_messages
2024-04-24 15:43:45,993 DEBUG SenderThread:1848599 [sender.py:send():382] send: telemetry
2024-04-24 15:43:45,993 DEBUG SenderThread:1848599 [sender.py:send():382] send: config
2024-04-24 15:43:45,993 DEBUG SenderThread:1848599 [sender.py:send():382] send: metric
2024-04-24 15:43:45,994 DEBUG SenderThread:1848599 [sender.py:send():382] send: telemetry
2024-04-24 15:43:45,994 DEBUG SenderThread:1848599 [sender.py:send():382] send: metric
2024-04-24 15:43:45,994 WARNING SenderThread:1848599 [sender.py:send_metric():1343] Seen metric with glob (shouldn't happen)
2024-04-24 15:43:45,994 DEBUG SenderThread:1848599 [sender.py:send():382] send: telemetry
2024-04-24 15:43:46,179 INFO wandb-upload_0:1848599 [upload_job.py:push():131] Uploaded file /tmp/tmphsb5r9cdwandb/sgr8lmob-wandb-metadata.json
2024-04-24 15:43:46,800 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/wandb-metadata.json
2024-04-24 15:43:46,801 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/output.log
2024-04-24 15:43:48,803 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/output.log
2024-04-24 15:43:50,251 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: status_report
2024-04-24 15:43:52,783 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: partial_history
2024-04-24 15:43:52,785 DEBUG SenderThread:1848599 [sender.py:send():382] send: metric
2024-04-24 15:43:52,785 DEBUG SenderThread:1848599 [sender.py:send():382] send: metric
2024-04-24 15:43:52,786 DEBUG SenderThread:1848599 [sender.py:send():382] send: metric
2024-04-24 15:43:52,786 DEBUG SenderThread:1848599 [sender.py:send():382] send: metric
2024-04-24 15:43:52,786 DEBUG SenderThread:1848599 [sender.py:send():382] send: history
2024-04-24 15:43:52,786 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: summary_record
2024-04-24 15:43:52,788 INFO SenderThread:1848599 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-04-24 15:43:52,807 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/wandb-summary.json
2024-04-24 15:43:54,212 DEBUG SenderThread:1848599 [sender.py:send():382] send: exit
2024-04-24 15:43:54,212 INFO SenderThread:1848599 [sender.py:send_exit():589] handling exit code: 1
2024-04-24 15:43:54,212 INFO SenderThread:1848599 [sender.py:send_exit():591] handling runtime: 14
2024-04-24 15:43:54,213 INFO SenderThread:1848599 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-04-24 15:43:54,213 INFO SenderThread:1848599 [sender.py:send_exit():597] send defer
2024-04-24 15:43:54,213 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:54,213 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 0
2024-04-24 15:43:54,214 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:54,214 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 0
2024-04-24 15:43:54,214 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 1
2024-04-24 15:43:54,214 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:54,214 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 1
2024-04-24 15:43:54,214 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:54,214 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 1
2024-04-24 15:43:54,214 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 2
2024-04-24 15:43:54,214 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:54,214 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 2
2024-04-24 15:43:54,214 INFO HandlerThread:1848599 [system_monitor.py:finish():203] Stopping system monitor
2024-04-24 15:43:54,214 DEBUG SystemMonitor:1848599 [system_monitor.py:_start():172] Starting system metrics aggregation loop
2024-04-24 15:43:54,215 DEBUG SystemMonitor:1848599 [system_monitor.py:_start():179] Finished system metrics aggregation loop
2024-04-24 15:43:54,215 DEBUG SystemMonitor:1848599 [system_monitor.py:_start():183] Publishing last batch of metrics
2024-04-24 15:43:54,215 INFO HandlerThread:1848599 [interfaces.py:finish():202] Joined cpu monitor
2024-04-24 15:43:54,217 INFO HandlerThread:1848599 [interfaces.py:finish():202] Joined disk monitor
2024-04-24 15:43:54,810 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/wandb-summary.json
2024-04-24 15:43:54,810 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/output.log
2024-04-24 15:43:56,812 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/output.log
2024-04-24 15:43:57,141 INFO HandlerThread:1848599 [interfaces.py:finish():202] Joined gpu monitor
2024-04-24 15:43:57,142 INFO HandlerThread:1848599 [interfaces.py:finish():202] Joined memory monitor
2024-04-24 15:43:57,142 INFO HandlerThread:1848599 [interfaces.py:finish():202] Joined network monitor
2024-04-24 15:43:57,142 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: poll_exit
2024-04-24 15:43:57,143 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: status_report
2024-04-24 15:43:57,143 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:57,143 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 2
2024-04-24 15:43:57,143 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 3
2024-04-24 15:43:57,144 DEBUG SenderThread:1848599 [sender.py:send():382] send: stats
2024-04-24 15:43:57,144 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:57,144 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: poll_exit
2024-04-24 15:43:57,145 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 3
2024-04-24 15:43:57,146 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:57,146 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 3
2024-04-24 15:43:57,146 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 4
2024-04-24 15:43:57,146 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:57,146 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 4
2024-04-24 15:43:57,147 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:57,147 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 4
2024-04-24 15:43:57,147 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 5
2024-04-24 15:43:57,147 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:57,147 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 5
2024-04-24 15:43:57,147 DEBUG SenderThread:1848599 [sender.py:send():382] send: summary
2024-04-24 15:43:57,149 INFO SenderThread:1848599 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-04-24 15:43:57,149 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:57,149 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 5
2024-04-24 15:43:57,149 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 6
2024-04-24 15:43:57,149 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:57,149 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 6
2024-04-24 15:43:57,149 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:57,149 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 6
2024-04-24 15:43:57,152 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: status_report
2024-04-24 15:43:57,275 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 7
2024-04-24 15:43:57,275 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:57,275 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 7
2024-04-24 15:43:57,275 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:57,275 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 7
2024-04-24 15:43:57,814 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/config.yaml
2024-04-24 15:43:57,814 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/wandb-summary.json
2024-04-24 15:43:58,791 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 8
2024-04-24 15:43:58,792 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:58,792 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 8
2024-04-24 15:43:58,792 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:43:58,792 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 8
2024-04-24 15:43:58,792 INFO SenderThread:1848599 [job_builder.py:build():298] Attempting to build job artifact
2024-04-24 15:43:58,794 INFO SenderThread:1848599 [job_builder.py:_get_source_type():428] is repo sourced job
2024-04-24 15:43:58,815 INFO Thread-12 :1848599 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/output.log
2024-04-24 15:43:58,832 INFO SenderThread:1848599 [job_builder.py:build():404] adding wandb-job metadata file
2024-04-24 15:43:58,858 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 9
2024-04-24 15:43:58,859 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:43:58,859 DEBUG SenderThread:1848599 [sender.py:send():382] send: artifact
2024-04-24 15:43:58,859 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 9
2024-04-24 15:43:59,524 INFO wandb-upload_0:1848599 [upload_job.py:push():89] Uploaded file /admin/home/sanchit/.local/share/wandb/artifacts/staging/tmp1vajxumh
2024-04-24 15:43:59,530 INFO wandb-upload_1:1848599 [upload_job.py:push():89] Uploaded file /admin/home/sanchit/.local/share/wandb/artifacts/staging/tmp824ipvc5
2024-04-24 15:44:00,093 INFO SenderThread:1848599 [sender.py:send_artifact():1470] sent artifact job-https___huggingface.co_sanchit-gandhi_distil-zephyr-1.5b-ssft-ultrachat_run_sft.py - {'id': 'QXJ0aWZhY3Q6ODA4NTQyNDIx', 'state': 'PENDING', 'artifactSequence': {'id': 'QXJ0aWZhY3RDb2xsZWN0aW9uOjE2NjI0NzU4Nw==', 'latestArtifact': None}}
2024-04-24 15:44:00,093 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:44:00,093 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 9
2024-04-24 15:44:00,093 INFO SenderThread:1848599 [dir_watcher.py:finish():358] shutting down directory watcher
2024-04-24 15:44:00,213 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: keepalive
2024-04-24 15:44:00,816 INFO SenderThread:1848599 [dir_watcher.py:finish():388] scan: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files
2024-04-24 15:44:00,817 INFO SenderThread:1848599 [dir_watcher.py:finish():402] scan save: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/conda-environment.yaml conda-environment.yaml
2024-04-24 15:44:00,817 INFO SenderThread:1848599 [dir_watcher.py:finish():402] scan save: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/wandb-summary.json wandb-summary.json
2024-04-24 15:44:00,817 INFO SenderThread:1848599 [dir_watcher.py:finish():402] scan save: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/output.log output.log
2024-04-24 15:44:00,821 INFO SenderThread:1848599 [dir_watcher.py:finish():402] scan save: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/config.yaml config.yaml
2024-04-24 15:44:00,824 INFO SenderThread:1848599 [dir_watcher.py:finish():402] scan save: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/requirements.txt requirements.txt
2024-04-24 15:44:00,826 INFO SenderThread:1848599 [dir_watcher.py:finish():402] scan save: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/wandb-metadata.json wandb-metadata.json
2024-04-24 15:44:00,826 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 10
2024-04-24 15:44:00,828 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:44:00,828 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 10
2024-04-24 15:44:00,828 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:44:00,828 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 10
2024-04-24 15:44:00,828 INFO SenderThread:1848599 [file_pusher.py:finish():175] shutting down file pusher
2024-04-24 15:44:01,006 INFO wandb-upload_0:1848599 [upload_job.py:push():131] Uploaded file /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/conda-environment.yaml
2024-04-24 15:44:01,059 INFO wandb-upload_1:1848599 [upload_job.py:push():131] Uploaded file /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/wandb-summary.json
2024-04-24 15:44:01,161 INFO wandb-upload_2:1848599 [upload_job.py:push():131] Uploaded file /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/output.log
2024-04-24 15:44:01,169 INFO wandb-upload_3:1848599 [upload_job.py:push():131] Uploaded file /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/config.yaml
2024-04-24 15:44:01,184 INFO wandb-upload_4:1848599 [upload_job.py:push():131] Uploaded file /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/files/requirements.txt
2024-04-24 15:44:01,384 INFO Thread-11 (_thread_body):1848599 [sender.py:transition_state():617] send defer: 11
2024-04-24 15:44:01,385 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:44:01,385 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 11
2024-04-24 15:44:01,385 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:44:01,385 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 11
2024-04-24 15:44:01,385 INFO SenderThread:1848599 [file_pusher.py:join():181] waiting for file pusher
2024-04-24 15:44:01,385 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 12
2024-04-24 15:44:01,385 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:44:01,385 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 12
2024-04-24 15:44:01,385 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:44:01,385 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 12
2024-04-24 15:44:01,386 INFO SenderThread:1848599 [file_stream.py:finish():595] file stream finish called
2024-04-24 15:44:01,445 INFO SenderThread:1848599 [file_stream.py:finish():599] file stream finish is done
2024-04-24 15:44:01,445 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 13
2024-04-24 15:44:01,445 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:44:01,445 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 13
2024-04-24 15:44:01,445 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:44:01,445 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 13
2024-04-24 15:44:01,445 INFO SenderThread:1848599 [sender.py:transition_state():617] send defer: 14
2024-04-24 15:44:01,446 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: defer
2024-04-24 15:44:01,446 DEBUG SenderThread:1848599 [sender.py:send():382] send: final
2024-04-24 15:44:01,446 INFO HandlerThread:1848599 [handler.py:handle_request_defer():172] handle defer: 14
2024-04-24 15:44:01,446 DEBUG SenderThread:1848599 [sender.py:send():382] send: footer
2024-04-24 15:44:01,446 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: defer
2024-04-24 15:44:01,446 INFO SenderThread:1848599 [sender.py:send_request_defer():613] handle sender defer: 14
2024-04-24 15:44:01,447 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: poll_exit
2024-04-24 15:44:01,447 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: poll_exit
2024-04-24 15:44:01,447 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: server_info
2024-04-24 15:44:01,447 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: get_summary
2024-04-24 15:44:01,448 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: server_info
2024-04-24 15:44:01,449 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: sampled_history
2024-04-24 15:44:01,449 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: internal_messages
2024-04-24 15:44:01,450 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: job_info
2024-04-24 15:44:01,507 DEBUG SenderThread:1848599 [sender.py:send_request():409] send_request: job_info
2024-04-24 15:44:01,508 INFO MainThread:1848599 [wandb_run.py:_footer_history_summary_info():3837] rendering history
2024-04-24 15:44:01,508 INFO MainThread:1848599 [wandb_run.py:_footer_history_summary_info():3869] rendering summary
2024-04-24 15:44:01,508 INFO MainThread:1848599 [wandb_run.py:_footer_sync_info():3796] logging synced files
2024-04-24 15:44:01,508 DEBUG HandlerThread:1848599 [handler.py:handle_request():146] handle_request: shutdown
2024-04-24 15:44:01,508 INFO HandlerThread:1848599 [handler.py:finish():866] shutting down handler
2024-04-24 15:44:02,450 INFO WriterThread:1848599 [datastore.py:close():294] close: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240424_154339-mwp0iutr/run-mwp0iutr.wandb
2024-04-24 15:44:02,508 INFO SenderThread:1848599 [sender.py:finish():1548] shutting down sender
2024-04-24 15:44:02,508 INFO SenderThread:1848599 [file_pusher.py:finish():175] shutting down file pusher
2024-04-24 15:44:02,508 INFO SenderThread:1848599 [file_pusher.py:join():181] waiting for file pusher