torch.distributed.DistNetworkError

#75
by yu19920006607 - opened

我在执行torchrun --nnodes 2 --nproc-per-node 8 --node-rank 200 --master-addr 100 generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200 命令时,报错出现以下错误:

捕获.PNG

torch.distributed.DistNetworkError: The client socket has failed to connect to any network address of (100, 29500). The client socket has failed to connect to 0.0.0.100:29500 (errno: 110 - Connection timed out).

Sign up or log in to comment