Unable to run with vLLM

#1
by yaronr - opened

Hi
I am getting the following error when running with latest vllm, docker.
Here are my runtime params:

    "command": "--port=8000 
                                --model=fsaudm/Meta-Llama-3.1-70B-Instruct-INT8
                                --tensor-parallel-size=4
                                --pipeline-parallel-size=1 
                                --disable-log-requests
                                --enable-chunked-prefill
                                --num-scheduler-steps=10
                                --enable-prefix-caching
                                --max-num-batched-tokens=16192
                                --max-model-len=16192
                --max-seq-len-to-capture=16192
                                --gpu-memory-utilization=0.95"

Here's the error:

(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     output = executor(*args, **kwargs)
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]              ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     self.model_runner.load_model()
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/multi_step_model_runner.py", line 645, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     return self._base_model_runner.load_model()
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1058, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     self.model = get_model(model_config=self.model_config,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 19, in get_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     return loader.load_model(model_config=model_config,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 402, in load_model
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     model.load_weights(self._get_all_weights(model_config, model))
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 582, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     loader.load_weights(
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 203, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     autoloaded_weights = list(self._load_module("", self.module, weights))
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 182, in _load_module
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     yield from self._load_module(prefix,
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 169, in _load_module
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     module_load_weights(weights)
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/llama.py", line 414, in load_weights
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]     param = params_dict[name]
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229]             ~~~~~~~~~~~^^^^^^
(VllmWorkerProcess pid=148) ERROR 10-30 01:05:12 multiproc_worker_utils.py:229] KeyError: 'layers.0.mlp.down_proj.SCB'

I am also getting the same error

@vijaydeshpande I started using Huggingface TGI (ghcr.io/huggingface/text-generation-inference:latest). Maybe a little bit less performant on some edge use cases, but it just works, with very little parameter tuning. It just figures it out and runs..

Sign up or log in to comment