Skip to content

Instantly share code, notes, and snippets.

@Flamefire
Created August 2, 2022 14:27
Show Gist options
  • Save Flamefire/dc1403ccefdebfc3412c6fbb2d5cbabd to your computer and use it in GitHub Desktop.
Save Flamefire/dc1403ccefdebfc3412c6fbb2d5cbabd to your computer and use it in GitHub Desktop.
(partial) EasyBuild log for failed build of /tmp/easybuild-tmp/eb-Qgl3gB/files_pr15919/p/PyTorch/PyTorch-1.9.0-foss-2020b.eb (PR(s) #15919)
[W context.cpp:154] Ignoring error since GraphTask is no longer valid: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
ERROR
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #2 to worker1: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:132)
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #3 to worker1: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:132)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #6 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #4 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #5 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #6 to worker1: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:132)
[W tensorpipe_agent.cpp:1025] RPC agent for worker2 encountered error when sending outgoing request #7 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #7 to worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #8 to worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #9 to worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[E container.cpp:257] Could not release Dist Autograd Context on node 1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
test_backward_node_failure_python_udf (__main__.TensorPipeDistAutogradTestWithSpawn) ... [W tensorpipe_agent.cpp:1025] RPC agent for worker3 encountered error when sending outgoing request #4 to worker2: ECONNRESET: connection reset by peer (this error originated at tensorpipe/transport/uv/connection_impl.cc:157)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[E container.cpp:257] Could not release Dist Autograd Context on node 3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #5 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker0 encountered error when sending outgoing request #10 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #7 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[E container.cpp:257] Could not release Dist Autograd Context on node 1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:1049] RPC agent for worker1 encountered error when reading incoming response from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #8 to worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker3 encountered error when sending outgoing request #6 to worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
This test simulates the situation where the 'backward' call might throw ... [W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
test_error_in_context (__main__.TensorPipeDistAutogradTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 271: mat1 and mat2 shapes cannot be multiplied (3x3 and 6x6)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[E container.cpp:257] Could not release Dist Autograd Context on node 0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[E container.cpp:257] Could not release Dist Autograd Context on node 3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[E container.cpp:257] Could not release Dist Autograd Context on node 1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[E container.cpp:257] Could not release Dist Autograd Context on node 3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[E container.cpp:257] Could not release Dist Autograd Context on node 2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error running optimizer.')
raise ValueError("Error running optimizer.")
ValueError: Error running optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
ValueError('Error creating optimizer.')
raise ValueError("Error creating optimizer.")
ValueError: Error creating optimizer.
test_async_function_remote_multi (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch.
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
*** Error in `/sw/installed/Python/3.8.6-GCCcore-10.2.0/bin/python': corrupted double-linked list: 0x0000200368007620 ***
[E request_callback_no_python.cpp:667] Received error while processing request type 263: Error on Node 1: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch.
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
RuntimeError: Error on Node 0: Error on Node 1: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch.
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
ERROR
test_call_python_function_remotely_from_script_not_supported (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: attempted to get undefined function python_function
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
test_call_script_function_that_not_exists_remotely_from_script (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: attempted to get undefined function nonexisting_script
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
test_call_script_function_that_raises_remotely_from_script (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
raise RuntimeError("Expected error")
RuntimeError: Expected error
test_remote_script_module (__main__.TensorPipeJitRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
RuntimeError('Expected error')
raise RuntimeError("Expected error")
RuntimeError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
test_handle_send_exceptions (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #1 to worker2: EPIPE: broken pipe (this error originated at tensorpipe/transport/uv/connection_impl.cc:157)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1025] RPC agent for worker1 encountered error when sending outgoing request #2 to worker2: EPIPE: broken pipe (this error originated at tensorpipe/transport/uv/connection_impl.cc:157)
test_local_shutdown (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
test_local_shutdown_with_rpc (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
ValueError('\nFirst line of error \n next line of error \n last line of error')
First line of error
next line of error
last line of error
ValueError('\nFirst line of error \n next line of error \n last line of error')
First line of error
next line of error
last line of error
ValueError('\nFirst line of error \n next line of error \n last line of error')
First line of error
next line of error
last line of error
ValueError('\nFirst line of error \n next line of error \n last line of error')
First line of error
next line of error
last line of error
ValueError('Expected error')
raise ValueError("Expected error")
ValueError: Expected error
ValueError('Expected error')
raise ValueError("Expected error")
ValueError: Expected error
ValueError('Expected error')
raise ValueError("Expected error")
ValueError: Expected error
ValueError('Expected error')
raise ValueError("Expected error")
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=0, name=worker0):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=3, name=worker3):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=1, name=worker1):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=2, name=worker2):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=0, name=worker0):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=1, name=worker1):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=3, name=worker3):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
ValueError('On WorkerInfo(id=2, name=worker2):\nValueError(\'Expected error\')\nTraceback (most recent call last):\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 210, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 336, in raise_func\n raise ValueError(expected_err)\nValueError: Expected error\n')
ValueError('Expected error')
ValueError: Expected error
test_wait_all_exit_early_builtin (__main__.TensorPipeRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The size of tensor a (10) must match the size of tensor b (5) at non-singleton dimension 0
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
test_wait_all_exit_early_script_function (__main__.TensorPipeRpcTestWithSpawn) ... [E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
[E request_callback_no_python.cpp:667] Received error while processing request type 256: The following operation failed in the TorchScript interpreter.
RuntimeError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
ValueError('Expected error')
ValueError: Expected error
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
Failed to respond to 'Shutdown Proceed' in time, got error Followers ['worker2', 'worker3', 'worker1'] timed out in _all_gather after 5.00 seconds. The first exception is eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker3 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:843] RPC agent for worker2 encountered error when reading incoming request from worker0: pipe closed (this error originated at tensorpipe/core/pipe_impl.cc:356)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker1: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:1049] RPC agent for worker0 encountered error when reading incoming response from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
Failed to respond to 'Shutdown Proceed' in time, got error Followers ['worker3', 'worker1', 'worker2'] timed out in _all_gather after 5.00 seconds. The first exception is eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
test_infer_backend_from_options (__main__.TensorPipeTensorPipeAgentRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:843] RPC agent for worker0 encountered error when reading incoming request from worker3: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
[W tensorpipe_agent.cpp:843] RPC agent for worker1 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
ERROR: test_backward_multiple_round_trips (__main__.TensorPipeDistAutogradTestWithSpawn)
RuntimeError: Process 2 exited with error code 10 and exception:
RuntimeError: Error on Node 2: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
ERROR: test_async_function_remote_multi (__main__.TensorPipeJitRpcTestWithSpawn)
RuntimeError: Process 3 exited with error code 10 and exception:
RuntimeError: Error on Node 0: Error on Node 1: rINTERNAL ASSERT FAILED at "../aten/src/ATen/core/jit_type_base.h":172, please report a bug to PyTorch.
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xf8 (0x20000364bfb8 in /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS/lib/python3.8/site-packages/torch/lib/libc10.so)
FAILED (errors=2, skipped=2)
distributed/rpc/test_tensorpipe_agent failed!
test_add_done_callback_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:200] Got the following error when running the callback: ValueError: Expected error
test_add_done_callback_no_arg_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:200] Got the following error when running the callback: TypeError: no_arg() takes 0 positional arguments but 1 was given
test_interleaving_then_and_add_done_callback_propagates_error (__main__.TestFuture) ... [E pybind_utils.h:200] Got the following error when running the callback: ValueError: Expected error
test_fx failed! Received signal: SIGSEGV
If an error occurs during packaging, it should not be shadowed by the allow_empty error. ... ok
Test that an error is thrown when a extern glob is specified with allow_empty=True ... ok
Failure to handle all dependencies should lead to an error. ... ok
Test that an error is thrown when a mock glob is specified with allow_empty=True ... ok
Directly saving/requiring an PackageImported module should raise a specific error message. ... skipped 'Tests that use temporary files are disabled in fbcode'
distributed/rpc/test_tensorpipe_agent failed!
test_fx failed! Received signal: SIGSEGV)
== 2022-08-02 16:26:19,461 filetools.py:382 INFO Path /tmp/easybuild-tmp/eb-Qgl3gB/tmp9KaKpS successfully removed.
== 2022-08-02 16:26:22,012 build_log.py:169 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:124 in __init__): 1 test (out of 55803) failed:
* distributed/rpc/test_tensorpipe_agent (at easybuild/easyblocks/p/pytorch.py:282 in test_step)
== 2022-08-02 16:26:22,012 build_log.py:265 INFO ... (took 2 hours 28 mins 17 secs)
== 2022-08-02 16:26:22,013 easyblock.py:4098 WARNING build failed (first 300 chars): 1 test (out of 55803) failed:
* distributed/rpc/test_tensorpipe_agent
== 2022-08-02 16:26:22,013 easyblock.py:318 INFO Closing log for application name PyTorch version 1.9.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment