Created
May 11, 2024 05:53
-
-
Save sfc-gh-zhwang/6e078b1621228cf9155a290cf268640c to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/home/corvo/.local/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. | |
warnings.warn( | |
2024-05-11 05:51:50,062 INFO worker.py:1749 -- Started a local Ray instance. | |
INFO 05-11 05:51:51 llm_engine.py:99] Initializing an LLM engine (v0.4.1) with config: model='/models/mistral-7b', speculative_config=None, tokenizer='/models/mistral-7b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0) | |
(pid=390908) /home/corvo/.local/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. | |
(pid=390908) warnings.warn( | |
(pid=391238) /home/corvo/.local/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.) | |
(pid=391238) warnings.warn( [repeated 3x across cluster] | |
(pid=391481) /home/corvo/.local/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. [repeated 3x across cluster] | |
(pid=391481) warnings.warn( [repeated 3x across cluster] | |
INFO 05-11 05:52:07 utils.py:620] Found nccl from library /home/corvo/.config/vllm/nccl/cu12/libnccl.so.2.18.1 | |
(RayWorkerWrapper pid=390990) INFO 05-11 05:52:07 utils.py:620] Found nccl from library /home/corvo/.config/vllm/nccl/cu12/libnccl.so.2.18.1 | |
(RayWorkerWrapper pid=390990) INFO 05-11 05:52:11 selector.py:28] Using FlashAttention-2 backend. | |
INFO 05-11 05:52:12 selector.py:28] Using FlashAttention-2 backend. | |
INFO 05-11 05:52:13 pynccl_utils.py:43] vLLM is using nccl==2.18.1 | |
(RayWorkerWrapper pid=390990) INFO 05-11 05:52:13 pynccl_utils.py:43] vLLM is using nccl==2.18.1 | |
(RayWorkerWrapper pid=391562) INFO 05-11 05:52:07 utils.py:620] Found nccl from library /home/corvo/.config/vllm/nccl/cu12/libnccl.so.2.18.1 [repeated 6x across cluster] | |
INFO 05-11 05:52:26 utils.py:130] reading GPU P2P access cache from /home/corvo/.config/vllm_cache/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json | |
(RayWorkerWrapper pid=390990) INFO 05-11 05:52:26 utils.py:130] reading GPU P2P access cache from /home/corvo/.config/vllm_cache/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json | |
(RayWorkerWrapper pid=391562) INFO 05-11 05:52:12 selector.py:28] Using FlashAttention-2 backend. [repeated 6x across cluster] | |
(RayWorkerWrapper pid=391562) INFO 05-11 05:52:13 pynccl_utils.py:43] vLLM is using nccl==2.18.1 [repeated 6x across cluster] | |
INFO 05-11 05:52:32 model_runner.py:174] Loading model weights took 1.6958 GB | |
(RayWorkerWrapper pid=390990) INFO 05-11 05:52:32 model_runner.py:174] Loading model weights took 1.6958 GB | |
(RayWorkerWrapper pid=391562) INFO 05-11 05:52:26 utils.py:130] reading GPU P2P access cache from /home/corvo/.config/vllm_cache/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json [repeated 6x across cluster] | |
INFO 05-11 05:52:32 models.py:361] Took 0.03276944160461426 seconds to load lora 1 from cpu to gpu memory | |
(RayWorkerWrapper pid=390990) INFO 05-11 05:52:32 models.py:361] Took 0.027169466018676758 seconds to load lora 1 from cpu to gpu memory | |
[rank0]:[E ProcessGroupNCCL.cpp:1414] [PG 1 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered | |
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. | |
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. | |
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. | |
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): | |
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fd37f0cf897 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libc10.so) | |
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fd37f07fb25 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libc10.so) | |
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fd37f1a7718 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so) | |
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fd332e4ae36 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7fd332e4ef38 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x77c (0x7fd332e545ac in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fd332e5531c in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #7: <unknown function> + 0xdc253 (0x7fd37e8b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) | |
frame #8: <unknown function> + 0x94ac3 (0x7fd37fd8bac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) | |
frame #9: <unknown function> + 0x126850 (0x7fd37fe1d850 in /usr/lib/x86_64-linux-gnu/libc.so.6) | |
[2024-05-11 05:52:37,595 E 386179 391997] logging.cc:101: Unhandled exception: N3c1016DistBackendErrorE. what(): [PG 1 Rank 0] Process group watchdog thread terminated with exception: CUDA error: an illegal memory access was encountered | |
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. | |
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. | |
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. | |
Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first): | |
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fd37f0cf897 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libc10.so) | |
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fd37f07fb25 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libc10.so) | |
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7fd37f1a7718 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libc10_cuda.so) | |
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7fd332e4ae36 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x58 (0x7fd332e4ef38 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x77c (0x7fd332e545ac in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7fd332e5531c in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #7: <unknown function> + 0xdc253 (0x7fd37e8b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) | |
frame #8: <unknown function> + 0x94ac3 (0x7fd37fd8bac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) | |
frame #9: <unknown function> + 0x126850 (0x7fd37fe1d850 in /usr/lib/x86_64-linux-gnu/libc.so.6) | |
Exception raised from ncclCommWatchdog at ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1418 (most recent call first): | |
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fd37f0cf897 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libc10.so) | |
frame #1: <unknown function> + 0xe32e33 (0x7fd332ad7e33 in /home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) | |
frame #2: <unknown function> + 0xdc253 (0x7fd37e8b0253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6) | |
frame #3: <unknown function> + 0x94ac3 (0x7fd37fd8bac3 in /usr/lib/x86_64-linux-gnu/libc.so.6) | |
frame #4: <unknown function> + 0x126850 (0x7fd37fe1d850 in /usr/lib/x86_64-linux-gnu/libc.so.6) | |
ERROR 05-11 05:52:37 worker_base.py:147] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution. | |
ERROR 05-11 05:52:37 worker_base.py:147] Traceback (most recent call last): | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/worker/worker_base.py", line 139, in execute_method | |
ERROR 05-11 05:52:37 worker_base.py:147] return executor(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context | |
ERROR 05-11 05:52:37 worker_base.py:147] return func(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/worker/worker.py", line 150, in determine_num_available_blocks | |
ERROR 05-11 05:52:37 worker_base.py:147] self.model_runner.profile_run() | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context | |
ERROR 05-11 05:52:37 worker_base.py:147] return func(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/worker/model_runner.py", line 841, in profile_run | |
ERROR 05-11 05:52:37 worker_base.py:147] self.execute_model(seqs, kv_caches) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context | |
ERROR 05-11 05:52:37 worker_base.py:147] return func(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/worker/model_runner.py", line 761, in execute_model | |
ERROR 05-11 05:52:37 worker_base.py:147] hidden_states = model_executable(**execute_model_kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return self._call_impl(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return forward_call(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 364, in forward | |
ERROR 05-11 05:52:37 worker_base.py:147] hidden_states = self.model(input_ids, positions, kv_caches, | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return self._call_impl(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return forward_call(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 291, in forward | |
ERROR 05-11 05:52:37 worker_base.py:147] hidden_states, residual = layer( | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return self._call_impl(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return forward_call(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 243, in forward | |
ERROR 05-11 05:52:37 worker_base.py:147] hidden_states = self.mlp(hidden_states) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return self._call_impl(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return forward_call(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 79, in forward | |
ERROR 05-11 05:52:37 worker_base.py:147] x, _ = self.down_proj(x) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return self._call_impl(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
ERROR 05-11 05:52:37 worker_base.py:147] return forward_call(*args, **kwargs) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/lora/layers.py", line 978, in forward | |
ERROR 05-11 05:52:37 worker_base.py:147] output_parallel = self.apply(input_parallel) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/lora/layers.py", line 945, in apply | |
ERROR 05-11 05:52:37 worker_base.py:147] output = self.base_layer.quant_method.apply(self.base_layer, x) | |
ERROR 05-11 05:52:37 worker_base.py:147] File "/home/corvo/vllm-project/vllm/model_executor/layers/linear.py", line 98, in apply | |
ERROR 05-11 05:52:37 worker_base.py:147] return F.linear(x, weight, bias) | |
ERROR 05-11 05:52:37 worker_base.py:147] RuntimeError: CUDA error: an illegal memory access was encountered | |
ERROR 05-11 05:52:37 worker_base.py:147] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. | |
ERROR 05-11 05:52:37 worker_base.py:147] For debugging consider passing CUDA_LAUNCH_BLOCKING=1. | |
ERROR 05-11 05:52:37 worker_base.py:147] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. | |
ERROR 05-11 05:52:37 worker_base.py:147] | |
[rank0]: Traceback (most recent call last): | |
[rank0]: File "/home/corvo/a.py", line 3, in <module> | |
[rank0]: llm = LLM( | |
[rank0]: File "/home/corvo/vllm-project/vllm/entrypoints/llm.py", line 118, in __init__ | |
[rank0]: self.llm_engine = LLMEngine.from_engine_args( | |
[rank0]: File "/home/corvo/vllm-project/vllm/engine/llm_engine.py", line 290, in from_engine_args | |
[rank0]: engine = cls( | |
[rank0]: File "/home/corvo/vllm-project/vllm/engine/llm_engine.py", line 170, in __init__ | |
[rank0]: self._initialize_kv_caches() | |
[rank0]: File "/home/corvo/vllm-project/vllm/engine/llm_engine.py", line 247, in _initialize_kv_caches | |
[rank0]: self.model_executor.determine_num_available_blocks()) | |
[rank0]: File "/home/corvo/vllm-project/vllm/executor/distributed_gpu_executor.py", line 27, in determine_num_available_blocks | |
[rank0]: num_blocks = self._run_workers("determine_num_available_blocks", ) | |
[rank0]: File "/home/corvo/vllm-project/vllm/executor/ray_gpu_executor.py", line 260, in _run_workers | |
[rank0]: driver_worker_output = self.driver_worker.execute_method( | |
[rank0]: File "/home/corvo/vllm-project/vllm/worker/worker_base.py", line 148, in execute_method | |
[rank0]: raise e | |
[rank0]: File "/home/corvo/vllm-project/vllm/worker/worker_base.py", line 139, in execute_method | |
[rank0]: return executor(*args, **kwargs) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context | |
[rank0]: return func(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/worker/worker.py", line 150, in determine_num_available_blocks | |
[rank0]: self.model_runner.profile_run() | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context | |
[rank0]: return func(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/worker/model_runner.py", line 841, in profile_run | |
[rank0]: self.execute_model(seqs, kv_caches) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context | |
[rank0]: return func(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/worker/model_runner.py", line 761, in execute_model | |
[rank0]: hidden_states = model_executable(**execute_model_kwargs) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
[rank0]: return self._call_impl(*args, **kwargs) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
[rank0]: return forward_call(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 364, in forward | |
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches, | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
[rank0]: return self._call_impl(*args, **kwargs) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
[rank0]: return forward_call(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 291, in forward | |
[rank0]: hidden_states, residual = layer( | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
[rank0]: return self._call_impl(*args, **kwargs) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
[rank0]: return forward_call(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 243, in forward | |
[rank0]: hidden_states = self.mlp(hidden_states) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
[rank0]: return self._call_impl(*args, **kwargs) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
[rank0]: return forward_call(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/model_executor/models/llama.py", line 79, in forward | |
[rank0]: x, _ = self.down_proj(x) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl | |
[rank0]: return self._call_impl(*args, **kwargs) | |
[rank0]: File "/home/corvo/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl | |
[rank0]: return forward_call(*args, **kwargs) | |
[rank0]: File "/home/corvo/vllm-project/vllm/lora/layers.py", line 978, in forward | |
[rank0]: output_parallel = self.apply(input_parallel) | |
[rank0]: File "/home/corvo/vllm-project/vllm/lora/layers.py", line 945, in apply | |
[rank0]: output = self.base_layer.quant_method.apply(self.base_layer, x) | |
[rank0]: File "/home/corvo/vllm-project/vllm/model_executor/layers/linear.py", line 98, in apply | |
[rank0]: return F.linear(x, weight, bias) | |
[rank0]: RuntimeError: CUDA error: an illegal memory access was encountered | |
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. | |
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1. | |
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. | |
[2024-05-11 05:52:37,618 E 386179 391997] logging.cc:108: Stack trace: | |
/home/corvo/.local/lib/python3.10/site-packages/ray/_raylet.so(+0x100fc3a) [0x7fd21d510c3a] ray::operator<<() | |
/home/corvo/.local/lib/python3.10/site-packages/ray/_raylet.so(+0x10126f8) [0x7fd21d5136f8] ray::TerminateHandler() | |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c) [0x7fd37e88220c] | |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277) [0x7fd37e882277] | |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae1fe) [0x7fd37e8821fe] | |
/home/corvo/.local/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so(+0xe32ee4) [0x7fd332ad7ee4] c10d::ProcessGroupNCCL::ncclCommWatchdog() | |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7fd37e8b0253] | |
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd37fd8bac3] | |
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7fd37fe1d850] | |
*** SIGABRT received at time=1715406757 on cpu 53 *** | |
PC: @ 0x7fd37fd8d9fc (unknown) pthread_kill | |
@ 0x7fd37fd39520 (unknown) (unknown) | |
[2024-05-11 05:52:37,618 E 386179 391997] logging.cc:365: *** SIGABRT received at time=1715406757 on cpu 53 *** | |
[2024-05-11 05:52:37,618 E 386179 391997] logging.cc:365: PC: @ 0x7fd37fd8d9fc (unknown) pthread_kill | |
[2024-05-11 05:52:37,619 E 386179 391997] logging.cc:365: @ 0x7fd37fd39520 (unknown) (unknown) | |
Fatal Python error: Aborted | |
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, charset_normalizer.md, yaml._yaml, psutil._psutil_linux, psutil._psutil_posix, sentencepiece._sentencepiece, msgpack._cmsgpack, google._upb._message, setproctitle, uvloop.loop, ray._raylet, regex._regex, cython.cimports.libc.math, pyarrow.lib, pyarrow._hdfsio, pyarrow._json, PIL._imaging (total: 36) | |
Aborted (core dumped) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment