当前位置:首页 > 大杂烩 > 正文内容

vllm运行qwen3_vl_32B大模型错误问题记录

错误信息如下:

-k sampling. For the best performance, please install FlashInfer.
(Worker_TP0 pid=3295275) INFO 10-24 17:57:43 [gpu_model_runner.py:2602] Starting to load model /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-32B-Instruct...
(Worker_TP1 pid=3295276) INFO 10-24 17:57:43 [gpu_model_runner.py:2602] Starting to load model /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-32B-Instruct...
(Worker_TP0 pid=3295275) INFO 10-24 17:57:43 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP0 pid=3295275) INFO 10-24 17:57:43 [cuda.py:366] Using Flash Attention backend on V1 engine.
(Worker_TP1 pid=3295276) INFO 10-24 17:57:43 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP1 pid=3295276) INFO 10-24 17:57:43 [cuda.py:366] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards:   0% Completed | 0/14 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   7% Completed | 1/14 [00:02<00:32,  2.52s/it]
Loading safetensors checkpoint shards:  14% Completed | 2/14 [00:03<00:19,  1.66s/it]
Loading safetensors checkpoint shards:  21% Completed | 3/14 [00:05<00:19,  1.81s/it]
Loading safetensors checkpoint shards:  29% Completed | 4/14 [00:07<00:20,  2.05s/it]
Loading safetensors checkpoint shards:  36% Completed | 5/14 [00:08<00:14,  1.63s/it]
Loading safetensors checkpoint shards:  43% Completed | 6/14 [00:12<00:17,  2.15s/it]
Loading safetensors checkpoint shards:  50% Completed | 7/14 [00:14<00:16,  2.29s/it]
Loading safetensors checkpoint shards:  57% Completed | 8/14 [00:17<00:14,  2.38s/it]
Loading safetensors checkpoint shards:  64% Completed | 9/14 [00:18<00:09,  1.92s/it]
Loading safetensors checkpoint shards:  71% Completed | 10/14 [00:20<00:08,  2.20s/it]
Loading safetensors checkpoint shards:  79% Completed | 11/14 [00:21<00:05,  1.84s/it]
Loading safetensors checkpoint shards:  86% Completed | 12/14 [00:23<00:03,  1.71s/it]
Loading safetensors checkpoint shards:  93% Completed | 13/14 [00:23<00:01,  1.33s/it]
Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:27<00:00,  1.95s/it]
Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:27<00:00,  1.94s/it]
(Worker_TP0 pid=3295275) 
(Worker_TP0 pid=3295275) INFO 10-24 17:58:10 [default_loader.py:267] Loading weights took 27.26 seconds
(Worker_TP1 pid=3295276) INFO 10-24 17:58:10 [default_loader.py:267] Loading weights took 27.18 seconds
(Worker_TP1 pid=3295276) INFO 10-24 17:58:11 [gpu_model_runner.py:2653] Model loading took 31.4550 GiB and 27.412074 seconds
(Worker_TP0 pid=3295275) INFO 10-24 17:58:11 [gpu_model_runner.py:2653] Model loading took 31.4550 GiB and 27.493010 seconds
(Worker_TP1 pid=3295276) INFO 10-24 17:58:11 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 153600 tokens, and profiled with 1 video items of the maximum feature size.
(Worker_TP0 pid=3295275) INFO 10-24 17:58:11 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 153600 tokens, and profiled with 1 video items of the maximum feature size.
(Worker_TP1 pid=3295276) INFO 10-24 17:58:38 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/932ff2018f/rank_1_0/backbone for vLLM's torch.compile
(Worker_TP1 pid=3295276) INFO 10-24 17:58:38 [backends.py:559] Dynamo bytecode transform time: 13.50 s
(Worker_TP0 pid=3295275) INFO 10-24 17:58:39 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/932ff2018f/rank_0_0/backbone for vLLM's torch.compile
(Worker_TP0 pid=3295275) INFO 10-24 17:58:39 [backends.py:559] Dynamo bytecode transform time: 13.77 s
(Worker_TP1 pid=3295276) INFO 10-24 17:58:43 [backends.py:197] Cache the graph for dynamic shape for later use
(Worker_TP0 pid=3295275) INFO 10-24 17:58:44 [backends.py:197] Cache the graph for dynamic shape for later use
(EngineCore_DP0 pid=3295096) INFO 10-24 17:59:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
(Worker_TP1 pid=3295276) INFO 10-24 17:59:36 [backends.py:218] Compiling a graph for dynamic shape takes 56.77 s
(Worker_TP0 pid=3295275) INFO 10-24 17:59:36 [backends.py:218] Compiling a graph for dynamic shape takes 56.82 s
(EngineCore_DP0 pid=3295096) INFO 10-24 18:00:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
(Worker_TP1 pid=3295276) INFO 10-24 18:00:17 [monitor.py:34] torch.compile takes 70.28 s in total
(Worker_TP0 pid=3295275) INFO 10-24 18:00:17 [monitor.py:34] torch.compile takes 70.60 s in total
(Worker_TP0 pid=3295275) INFO 10-24 18:00:18 [gpu_worker.py:298] Available KV cache memory: 0.67 GiB
(Worker_TP1 pid=3295276) INFO 10-24 18:00:18 [gpu_worker.py:298] Available KV cache memory: 0.67 GiB
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 199, in _initialize_kv_caches
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]     kv_cache_configs = get_kv_cache_configs(vllm_config, kv_cache_specs,
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 1243, in get_kv_cache_configs
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]     check_enough_kv_cache_memory(vllm_config, kv_cache_spec_one_worker,
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 716, in check_enough_kv_cache_memory
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708]     raise ValueError(
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] ValueError: To serve at least one request with the models's max seq len (262144), (32.00 GiB KV cache is needed, which is larger than the available KV cache memory (0.67 GiB). Based on the available memory, the estimated maximum model length is 5472. Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:21 [multiproc_executor.py:154] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=3295096) Process EngineCore_DP0:
(EngineCore_DP0 pid=3295096) Traceback (most recent call last):
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=3295096)     self.run()
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=3295096)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=3295096)     raise e
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=3295096)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=3295096)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=3295096)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=3295096)     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 199, in _initialize_kv_caches
(EngineCore_DP0 pid=3295096)     kv_cache_configs = get_kv_cache_configs(vllm_config, kv_cache_specs,
(EngineCore_DP0 pid=3295096)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 1243, in get_kv_cache_configs
(EngineCore_DP0 pid=3295096)     check_enough_kv_cache_memory(vllm_config, kv_cache_spec_one_worker,
(EngineCore_DP0 pid=3295096)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 716, in check_enough_kv_cache_memory
(EngineCore_DP0 pid=3295096)     raise ValueError(
(EngineCore_DP0 pid=3295096) ValueError: To serve at least one request with the models's max seq len (262144), (32.00 GiB KV cache is needed, which is larger than the available KV cache memory (0.67 GiB). Based on the available memory, the estimated maximum model length is 5472. Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
(APIServer pid=3294801) Traceback (most recent call last):
(APIServer pid=3294801)   File "<frozen runpy>", line 198, in _run_module_as_main
(APIServer pid=3294801)   File "<frozen runpy>", line 88, in _run_code
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1953, in <module>
(APIServer pid=3294801)     uvloop.run(run_server(args))
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(APIServer pid=3294801)     return runner.run(wrapper())
(APIServer pid=3294801)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=3294801)     return self._loop.run_until_complete(task)
(APIServer pid=3294801)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=3294801)     return await main
(APIServer pid=3294801)            ^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=3294801)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=3294801)     async with build_async_engine_client(
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=3294801)     return await anext(self.gen)
(APIServer pid=3294801)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=3294801)     async with build_async_engine_client_from_engine_args(
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=3294801)     return await anext(self.gen)
(APIServer pid=3294801)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=3294801)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=3294801)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/utils/__init__.py", line 1572, in inner
(APIServer pid=3294801)     return fn(*args, **kwargs)
(APIServer pid=3294801)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=3294801)     return cls(
(APIServer pid=3294801)            ^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=3294801)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=3294801)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=3294801)     return AsyncMPClient(*client_args)
(APIServer pid=3294801)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=3294801)     super().__init__(
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=3294801)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/contextlib.py", line 144, in __exit__
(APIServer pid=3294801)     next(self.gen)
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=3294801)     wait_for_engine_startup(
(APIServer pid=3294801)   File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=3294801)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=3294801) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

没有任何显存问题,我8卡L20运行不了你了,那么开启多卡启动

python -m vllm.entrypoints.openai.api_server --model /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-32B-Instruct --tensor-parallel-size 4

成功搞定

扫描二维码推送至手机访问。

版权声明:本文由高久峰个人博客发布,如需转载请注明出处。

本文链接:https://blog.20230611.cn/post/916.html

分享给朋友:

“vllm运行qwen3_vl_32B大模型错误问题记录” 的相关文章

Git本地仓库学习

Git本地仓库学习

1.全局用户信息设置 git  config  --global  user.name  gaojiufeng git  config  --global  user.email  392223903...

PHP安装mongodb扩展

PHP安装mongodb扩展

在安装之前我们先看看官方给出的依赖关系.首先是dll文件和mongodb软件的依赖关系然后是PHP文件和dll的依赖关系我的是phpstudy的集成环境PHP5.4.45 NTS+Apache+Mysql【一】.安装mongodb3.0软件对比依赖关系下载mongodb3.0.msi软件,完整名称:...

Git推送文件到远程仓库

Git推送文件到远程仓库

1.远程仓库的协作模式开发者把自己最新的版本推到线上仓库,同时把线上仓库的最新代码,拉到自己本地即可2.注册git帐号国外: http://www.github.com国内: http://git.oschina.net2.在码云创建项目,不要初始化readmegit push https://gi...

Git日志查看和版本切换

Git日志查看和版本切换

日志查看:git log版本切换:方式1:git  reset  --hard  HEAD^   倒退一个版本git  reset  --hard  HEAD^^  倒退两个版本方式2:(版本号的形式,建议版本号码补充完...

IE浏览器无法显示此页解决方案

IE浏览器无法显示此页解决方案

方案1.IE浏览器"无法显示此页"的解决办法(1).按下Win+R键打开运行,输入netsh winsock reset,回车;(2).重启即可. 方案2.IE浏览器"无法显示此页"的解决办法 (1).设置-连接-局域网设置-自动检测设置开...

nginx配置https,nginx ssl配置

nginx配置https,nginx ssl配置

首先在阿里云申请免费的证书,选择自动生成证书。然后就是nginx虚拟主机配置文件的修改。以下是我的配置文件(因为公司开发小程序,没有办法只能使用https)。您只需要关注带有ssl的配置选项,我增加了一个监听80和443的端口,同时增加了http跳转https的配置server  &nbs...