错误信息如下:
-k sampling. For the best performance, please install FlashInfer.
(Worker_TP0 pid=3295275) INFO 10-24 17:57:43 [gpu_model_runner.py:2602] Starting to load model /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-32B-Instruct...
(Worker_TP1 pid=3295276) INFO 10-24 17:57:43 [gpu_model_runner.py:2602] Starting to load model /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-32B-Instruct...
(Worker_TP0 pid=3295275) INFO 10-24 17:57:43 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP0 pid=3295275) INFO 10-24 17:57:43 [cuda.py:366] Using Flash Attention backend on V1 engine.
(Worker_TP1 pid=3295276) INFO 10-24 17:57:43 [gpu_model_runner.py:2634] Loading model from scratch...
(Worker_TP1 pid=3295276) INFO 10-24 17:57:43 [cuda.py:366] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/14 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 7% Completed | 1/14 [00:02<00:32, 2.52s/it]
Loading safetensors checkpoint shards: 14% Completed | 2/14 [00:03<00:19, 1.66s/it]
Loading safetensors checkpoint shards: 21% Completed | 3/14 [00:05<00:19, 1.81s/it]
Loading safetensors checkpoint shards: 29% Completed | 4/14 [00:07<00:20, 2.05s/it]
Loading safetensors checkpoint shards: 36% Completed | 5/14 [00:08<00:14, 1.63s/it]
Loading safetensors checkpoint shards: 43% Completed | 6/14 [00:12<00:17, 2.15s/it]
Loading safetensors checkpoint shards: 50% Completed | 7/14 [00:14<00:16, 2.29s/it]
Loading safetensors checkpoint shards: 57% Completed | 8/14 [00:17<00:14, 2.38s/it]
Loading safetensors checkpoint shards: 64% Completed | 9/14 [00:18<00:09, 1.92s/it]
Loading safetensors checkpoint shards: 71% Completed | 10/14 [00:20<00:08, 2.20s/it]
Loading safetensors checkpoint shards: 79% Completed | 11/14 [00:21<00:05, 1.84s/it]
Loading safetensors checkpoint shards: 86% Completed | 12/14 [00:23<00:03, 1.71s/it]
Loading safetensors checkpoint shards: 93% Completed | 13/14 [00:23<00:01, 1.33s/it]
Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:27<00:00, 1.95s/it]
Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:27<00:00, 1.94s/it]
(Worker_TP0 pid=3295275)
(Worker_TP0 pid=3295275) INFO 10-24 17:58:10 [default_loader.py:267] Loading weights took 27.26 seconds
(Worker_TP1 pid=3295276) INFO 10-24 17:58:10 [default_loader.py:267] Loading weights took 27.18 seconds
(Worker_TP1 pid=3295276) INFO 10-24 17:58:11 [gpu_model_runner.py:2653] Model loading took 31.4550 GiB and 27.412074 seconds
(Worker_TP0 pid=3295275) INFO 10-24 17:58:11 [gpu_model_runner.py:2653] Model loading took 31.4550 GiB and 27.493010 seconds
(Worker_TP1 pid=3295276) INFO 10-24 17:58:11 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 153600 tokens, and profiled with 1 video items of the maximum feature size.
(Worker_TP0 pid=3295275) INFO 10-24 17:58:11 [gpu_model_runner.py:3344] Encoder cache will be initialized with a budget of 153600 tokens, and profiled with 1 video items of the maximum feature size.
(Worker_TP1 pid=3295276) INFO 10-24 17:58:38 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/932ff2018f/rank_1_0/backbone for vLLM's torch.compile
(Worker_TP1 pid=3295276) INFO 10-24 17:58:38 [backends.py:559] Dynamo bytecode transform time: 13.50 s
(Worker_TP0 pid=3295275) INFO 10-24 17:58:39 [backends.py:548] Using cache directory: /root/.cache/vllm/torch_compile_cache/932ff2018f/rank_0_0/backbone for vLLM's torch.compile
(Worker_TP0 pid=3295275) INFO 10-24 17:58:39 [backends.py:559] Dynamo bytecode transform time: 13.77 s
(Worker_TP1 pid=3295276) INFO 10-24 17:58:43 [backends.py:197] Cache the graph for dynamic shape for later use
(Worker_TP0 pid=3295275) INFO 10-24 17:58:44 [backends.py:197] Cache the graph for dynamic shape for later use
(EngineCore_DP0 pid=3295096) INFO 10-24 17:59:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
(Worker_TP1 pid=3295276) INFO 10-24 17:59:36 [backends.py:218] Compiling a graph for dynamic shape takes 56.77 s
(Worker_TP0 pid=3295275) INFO 10-24 17:59:36 [backends.py:218] Compiling a graph for dynamic shape takes 56.82 s
(EngineCore_DP0 pid=3295096) INFO 10-24 18:00:11 [shm_broadcast.py:466] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation).
(Worker_TP1 pid=3295276) INFO 10-24 18:00:17 [monitor.py:34] torch.compile takes 70.28 s in total
(Worker_TP0 pid=3295275) INFO 10-24 18:00:17 [monitor.py:34] torch.compile takes 70.60 s in total
(Worker_TP0 pid=3295275) INFO 10-24 18:00:18 [gpu_worker.py:298] Available KV cache memory: 0.67 GiB
(Worker_TP1 pid=3295276) INFO 10-24 18:00:18 [gpu_worker.py:298] Available KV cache memory: 0.67 GiB
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 199, in _initialize_kv_caches
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] kv_cache_configs = get_kv_cache_configs(vllm_config, kv_cache_specs,
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 1243, in get_kv_cache_configs
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] check_enough_kv_cache_memory(vllm_config, kv_cache_spec_one_worker,
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 716, in check_enough_kv_cache_memory
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] raise ValueError(
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:18 [core.py:708] ValueError: To serve at least one request with the models's max seq len (262144), (32.00 GiB KV cache is needed, which is larger than the available KV cache memory (0.67 GiB). Based on the available memory, the estimated maximum model length is 5472. Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
(EngineCore_DP0 pid=3295096) ERROR 10-24 18:00:21 [multiproc_executor.py:154] Worker proc VllmWorker-1 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=3295096) Process EngineCore_DP0:
(EngineCore_DP0 pid=3295096) Traceback (most recent call last):
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=3295096) self.run()
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=3295096) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=3295096) raise e
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=3295096) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=3295096) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=3295096) super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=3295096) self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 199, in _initialize_kv_caches
(EngineCore_DP0 pid=3295096) kv_cache_configs = get_kv_cache_configs(vllm_config, kv_cache_specs,
(EngineCore_DP0 pid=3295096) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 1243, in get_kv_cache_configs
(EngineCore_DP0 pid=3295096) check_enough_kv_cache_memory(vllm_config, kv_cache_spec_one_worker,
(EngineCore_DP0 pid=3295096) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/core/kv_cache_utils.py", line 716, in check_enough_kv_cache_memory
(EngineCore_DP0 pid=3295096) raise ValueError(
(EngineCore_DP0 pid=3295096) ValueError: To serve at least one request with the models's max seq len (262144), (32.00 GiB KV cache is needed, which is larger than the available KV cache memory (0.67 GiB). Based on the available memory, the estimated maximum model length is 5472. Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
(APIServer pid=3294801) Traceback (most recent call last):
(APIServer pid=3294801) File "<frozen runpy>", line 198, in _run_module_as_main
(APIServer pid=3294801) File "<frozen runpy>", line 88, in _run_code
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1953, in <module>
(APIServer pid=3294801) uvloop.run(run_server(args))
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(APIServer pid=3294801) return runner.run(wrapper())
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=3294801) return self._loop.run_until_complete(task)
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=3294801) return await main
(APIServer pid=3294801) ^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=3294801) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=3294801) async with build_async_engine_client(
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=3294801) return await anext(self.gen)
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=3294801) async with build_async_engine_client_from_engine_args(
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=3294801) return await anext(self.gen)
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=3294801) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/utils/__init__.py", line 1572, in inner
(APIServer pid=3294801) return fn(*args, **kwargs)
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=3294801) return cls(
(APIServer pid=3294801) ^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=3294801) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=3294801) return AsyncMPClient(*client_args)
(APIServer pid=3294801) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=3294801) super().__init__(
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=3294801) with launch_core_engines(vllm_config, executor_class,
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/contextlib.py", line 144, in __exit__
(APIServer pid=3294801) next(self.gen)
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=3294801) wait_for_engine_startup(
(APIServer pid=3294801) File "/www/server/pyporject_evn/versions/3.11.13/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=3294801) raise RuntimeError("Engine core initialization failed. "
(APIServer pid=3294801) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}没有任何显存问题,我8卡L20运行不了你了,那么开启多卡启动
python -m vllm.entrypoints.openai.api_server --model /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-32B-Instruct --tensor-parallel-size 4
成功搞定
【一】.钩子文件的设置和创建(1).打开hooks目录,可以看到有一个post-commit.tmpl文件,这是一个模板文件。复制一份,重命名为post-commit,将其用户组设为www,并设置为可执行。chown www:www post-commitchmod +x post-commit(2...
1.全局用户信息设置 git config --global user.name gaojiufeng git config --global user.email 392223903...
在安装之前我们先看看官方给出的依赖关系.首先是dll文件和mongodb软件的依赖关系然后是PHP文件和dll的依赖关系我的是phpstudy的集成环境PHP5.4.45 NTS+Apache+Mysql【一】.安装mongodb3.0软件对比依赖关系下载mongodb3.0.msi软件,完整名称:...
Application 对象用于存储和访问来自任意页面的变量,类似 Session 对象。不同之处在于所有的用户分享一个 Application 对象,而 session 对象和用户的关系是一一对应的。很多的书籍中介绍的Application对象都喜欢以统计在线人数来介绍Application 对象...
1.远程仓库的协作模式开发者把自己最新的版本推到线上仓库,同时把线上仓库的最新代码,拉到自己本地即可2.注册git帐号国外: http://www.github.com国内: http://git.oschina.net2.在码云创建项目,不要初始化readmegit push https://gi...
git pull https://git.oschina.net/392223903/learn.git master 换为您的git地址...