Created
August 2, 2022 14:27
-
-
Save Flamefire/dc1403ccefdebfc3412c6fbb2d5cbabd to your computer and use it in GitHub Desktop.
(partial) EasyBuild log for failed build of /tmp/easybuild-tmp/eb-Qgl3gB/files_pr15919/p/PyTorch/PyTorch-1.9.0-foss-2020b.eb (PR(s) #15919)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[W context.cpp:154] Ignoring error since GraphTask is no longer valid: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
ERROR | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #2 to worker1: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:132) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #3 to worker1: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:132) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #6 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #4 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #5 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #6 to worker1: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:132) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #7 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #7 to worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #8 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #9 to worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
test_backward_node_failure_python_udf (__main__.TensorPipeDistAutogradTestWithSpawn) ... [W tensorpipe_agent.cpp:1025] RPC agent for worker3 encountered error when sending outgoing request #4 to worker2: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:157) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #5 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #10 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #7 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker1 encountered error when reading incoming response from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #8 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker3 encountered error when sending outgoing request #6 to worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
This test simulates the situation where the 'backward' call might throw ... [W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
test_error_in_context (__main__.TensorPipeDistAutogradTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6) | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6) | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6) | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6) | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[E container.cpp:257] Could not release Dist Autograd Context on node 2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error running optimizer.') | |
raise ValueError("Error running optimizer.") | |
ValueError: Error running optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
ValueError('Error creating optimizer.') | |
raise ValueError("Error creating optimizer.") | |
ValueError: Error creating optimizer. | |
test_async_function_remote_multi (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch. | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
*** Error in `/sw/installed/Python/3.8.6-GCCcore-10.2.0/bin/python': corrupted double-linked list: 0x0000200368007620 *** | |
[E request_callback_no_python.cpp:667] Received error while processing request type 263: Error on Node 1: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch. | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
RuntimeError: Error on Node 0: Error on Node 1: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch. | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
ERROR | |
test_call_python_function_remotely_from_script_not_supported (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: attempted to get undefined function python_function | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
test_call_script_function_that_not_exists_remotely_from_script (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: attempted to get undefined function nonexisting_script | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
test_call_script_function_that_raises_remotely_from_script (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
test_remote_script_module (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
RuntimeError('Expected error') | |
raise RuntimeError("Expected error") | |
RuntimeError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
test_handle_send_exceptions (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #1 to worker2: EPIPE: broken pipe (this error originated at tensorpipe/transport/uv/connection_impl.cc:157) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #2 to worker2: EPIPE: broken pipe (this error originated at tensorpipe/transport/uv/connection_impl.cc:157) | |
test_local_shutdown (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
test_local_shutdown_with_rpc (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
ValueError('\nFirst line of error \n next line of error \n last line of error') | |
First line of error | |
next line of error | |
last line of error | |
ValueError('\nFirst line of error \n next line of error \n last line of error') | |
First line of error | |
next line of error | |
last line of error | |
ValueError('\nFirst line of error \n next line of error \n last line of error') | |
First line of error | |
next line of error | |
last line of error | |
ValueError('\nFirst line of error \n next line of error \n last line of error') | |
First line of error | |
next line of error | |
last line of error | |
ValueError('Expected error') | |
raise ValueError("Expected error") | |
ValueError: Expected error | |
ValueError('Expected error') | |
raise ValueError("Expected error") | |
ValueError: Expected error | |
ValueError('Expected error') | |
raise ValueError("Expected error") | |
ValueError: Expected error | |
ValueError('Expected error') | |
raise ValueError("Expected error") | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=0, name=worker0):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=3, name=worker3):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=1, name=worker1):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=2, name=worker2):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=0, name=worker0):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=1, name=worker1):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=3, name=worker3):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('On WorkerInfo(id=2, name=worker2):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n') | |
ValueError('Expected error') | |
ValueError: Expected error | |
test_wait_all_exit_early_builtin (__main__.TensorPipeRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0 | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
test_wait_all_exit_early_script_function (__main__.TensorPipeRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter. | |
RuntimeError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
ValueError('Expected error') | |
ValueError: Expected error | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
Failed to respond to 'Shutdown Proceed' in time, got error Followers ['worker2', 'worker3', 'worker1'] timed out in _all_gather after 5.00 seconds. The first exception is eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
Failed to respond to 'Shutdown Proceed' in time, got error Followers ['worker3', 'worker1', 'worker2'] timed out in _all_gather after 5.00 seconds. The first exception is eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
test_infer_backend_from_options (__main__.TensorPipeTensorPipeAgentRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
ERROR: test_backward_multiple_round_trips (__main__.TensorPipeDistAutogradTestWithSpawn) | |
RuntimeError: Process 2 exited with error code 10 and exception: | |
RuntimeError: Error on Node 2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259) | |
ERROR: test_async_function_remote_multi (__main__.TensorPipeJitRpcTestWithSpawn) | |
RuntimeError: Process 3 exited with error code 10 and exception: | |
RuntimeError: Error on Node 0: Error on Node 1: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch. | |
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so) | |
FAILED (errors=2, skipped=2) | |
distributed/rpc/test_tensorpipe_agent failed! | |
test_add_done_callback_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:200] Got the following error when running the callback: ValueError: Expected error | |
test_add_done_callback_no_arg_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:200] Got the following error when running the callback: TypeError: no_arg() takes 0 positional arguments but 1 was given | |
test_interleaving_then_and_add_done_callback_propagates_error (__main__.TestFuture) ... [E pybind_utils.h:200] Got the following error when running the callback: ValueError: Expected error | |
test_fx failed! Received signal: SIGSEGV | |
If an error occurs during packaging, it should not be shadowed by the allow_empty error. ... ok | |
Test that an error is thrown when a extern glob is specified with allow_empty=True ... ok | |
Failure to handle all dependencies should lead to an error. ... ok | |
Test that an error is thrown when a mock glob is specified with allow_empty=True ... ok | |
Directly saving/requiring an PackageImported module should raise a specific error message. ... skipped 'Tests that use temporary files are disabled in fbcode' | |
distributed/rpc/test_tensorpipe_agent failed! | |
test_fx failed! Received signal: SIGSEGV) | |
== 2022-08-02 16:26:19,461 filetools.py:382 INFO Path /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS successfully removed. | |
== 2022-08-02 16:26:22,012 build_log.py:169 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:124 in __init__): 1 test (out of 55803) failed: | |
* distributed/rpc/test_tensorpipe_agent (at easybuild/easyblocks/p/pytorch.py:282 in test_step) | |
== 2022-08-02 16:26:22,012 build_log.py:265 INFO ... (took 2 hours 28 mins 17 secs) | |
== 2022-08-02 16:26:22,013 easyblock.py:4098 WARNING build failed (first 300 chars): 1 test (out of 55803) failed: | |
* distributed/rpc/test_tensorpipe_agent | |
== 2022-08-02 16:26:22,013 easyblock.py:318 INFO Closing log for application name PyTorch version 1.9.0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment