Skip to content

Instantly share code, notes, and snippets.

@czgdp1807
Created November 2, 2021 10:40
Show Gist options
  • Save czgdp1807/bc9ab4da6bcf924222165689804d6f2a to your computer and use it in GitHub Desktop.
Save czgdp1807/bc9ab4da6bcf924222165689804d6f2a to your computer and use it in GitHub Desktop.
(ray_dev) C:\Users\gagan\ray_project\ray>pytest -v -s --count=1 python\ray\tests\test_multinode_failures_2.py::test_actor_creation_node_failure
Test session starts (platform: win32, Python 3.8.11, pytest 5.4.3, pytest-sugar 0.9.4)
cachedir: .pytest_cache
rootdir: C:\Users\gagan\ray_project\ray\python
plugins: anyio-3.3.2, asyncio-0.15.1, lazy-fixture-0.6.3, repeat-0.9.1, rerunfailures-10.2, sugar-0.9.4, timeout-1.4.2
collecting ... 2021-11-02 10:14:14,937  INFO worker.py:838 -- Connecting to existing Ray cluster at address: 127.0.0.1:6379
(pid=None)
2021-11-02 10:14:36,953 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffae0bd0c1a64ad1765b178ede01000000 Worker ID: fa8e061419f1162b00067461c6bc66bfde0b6fb61388f2b95b7e50dc Node ID: f5036d1f1684f4ddfd568cd76dd63289447cb2781a8d28449d908ab8 Worker IP address: 127.0.0.1 Worker port: 53256 Worker PID: 11284

――――――――――――――――――――― test_actor_creation_node_failure[ray_start_cluster0] ―――――――――――――――――――――


2021-11-02 10:38:44,509 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff48ffebec2b5787bc344d73ad01000000 Worker ID: 5c86fa5f45db9992fb82254bc9b184f6ca3f4638617d0bd4e0afc845 Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53296 Worker PID: 8784
ray_start_cluster = <ray.cluster_utils.Cluster object at 0x000001A15BB879D0>2021-11-02 10:38:44,509      WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff82b7742008bdf710055519f301000000 Worker ID: 1039f96a0f767350c1a48acd9d87a91110edd5a629173b48409abc95 Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53311 Worker PID: 8512

2021-11-02 10:38:44,509 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff4c3313e0022f6ad6f7cb442901000000 Worker ID: f8866a810d6bbb06441375e5fb050543b1537f326c85edd555b84d43 Node ID: 070019f351d7299024ffac98817cd27c75e18695e67b5e4fba4da174 Worker IP address: 127.0.0.1 Worker port: 53326 Worker PID: 3832

(pid=None)2021-11-02 10:38:44,524       WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffee46e1f0bf2d6505bce826cf01000000 Worker ID: 2f2a3ef95d196f453e0902989770aba83b1d4ed4315149ad23777bbf Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53304 Worker PID: 232
   File "c:\users\gagan\ray_project\ray\python\ray\workers/default_worker.py", line 185, in <module>
2021-11-02 10:38:44,524 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff033a2b37e78afc0ce82ee9d701000000 Worker ID: 5bc3b9175dedc94145be8a5719005373944eb543d8b9fb1f801873d5 Node ID: f5036d1f1684f4ddfd568cd76dd63289447cb2781a8d28449d908ab8 Worker IP address: 127.0.0.1 Worker port: 53429 Worker PID: 9524
2021-11-02 10:38:44,571 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffa8570e08c617c6dea57a983401000000 Worker ID: 8d1f1066884781aa246d421af6f2f2dbaeabd82b329e8b2edb0f0e55 Node ID: f5036d1f1684f4ddfd568cd76dd63289447cb2781a8d28449d908ab8 Worker IP address: 127.0.0.1 Worker port: 53438 Worker PID: 8024
2021-11-02 10:38:44,590 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffd5e240c27f5038729286186101000000 Worker ID: 4314807ed14a9a6e2fa2f28d43c427cfc23a32a3df8c107df90ba00d Node ID: 070019f351d7299024ffac98817cd27c75e18695e67b5e4fba4da174 Worker IP address: 127.0.0.1 Worker port: 53453 Worker PID: 11316
2021-11-02 10:38:44,590 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: fffffffffffffffffdcd68ec7417cb4194ad77b101000000 Worker ID: e9c6c669592c2cbd620e873e5e1caea2334a7d4bc9ad3e55528e3702 Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53471 Worker PID: 11372
2021-11-02 10:38:44,731 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffe2b3a0959942db2ef065496d01000000 Worker ID: 791d6abe1d80a9ce2c77352181c8a0a44b1355c10044bfcdb9a649c4 Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53486 Worker PID: 11692

2021-11-02 10:38:44,731 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffeffa03c835af3ee332a7edda01000000 Worker ID: efc12a1e0af9662e19ea85e8a2120e80c7bebeb89ffb87dbe16dec14 Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53591 Worker PID: 3188
2021-11-02 10:38:44,731 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffd9e8e2da121274b5ab8b703701000000 Worker ID: aae722df4ad38baee0bcbbbbee4cc62f2aa30f91fc8178d223b5fc74 Node ID: 070019f351d7299024ffac98817cd27c75e18695e67b5e4fba4da174 Worker IP address: 127.0.0.1 Worker port: 53626 Worker PID: 7948
(pid=None)    2021-11-02 10:38:44,731   WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffd4a74bf6aac6724c8e20f48001000000 Worker ID: 7852fb1d813dc22ad064ec9540b413c3d99a7e1909f209df29968307 Node ID: f5036d1f1684f4ddfd568cd76dd63289447cb2781a8d28449d908ab8 Worker IP address: 127.0.0.1 Worker port: 53629 Worker PID: 1256
@pytest2021-11-02 10:38:44,746  WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffdc123dc06f9b9919c326333901000000 Worker ID: bd6ed7e6903f064811661c5abdf210f6a63052c968e121a67f08659f Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53710 Worker PID: 7728
.mark.parametrize(gagan\ray_project\ray\python\ray\node.py", line 221, in __init__

2021-11-02 10:38:44,746 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffdec929a2b58d1ecc72d67da801000000 Worker ID: 13e774cffa21b355833c947467ae16e171e820655ee096fb3b00b2a3 Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53592 Worker PID: 7960
        2021-11-02 10:38:44,762 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffff06aaba414ca1cbeb9db70cb301000000 Worker ID: 2b6be0239748d0e936bff9594840fc5762b96ceab46e1cbd72f85517 Node ID: 070019f351d7299024ffac98817cd27c75e18695e67b5e4fba4da174 Worker IP address: 127.0.0.1 Worker port: 53280 Worker PID: 8096
(pid=None)"2021-11-02 10:38:44,790      WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffef0e9fa5af3e0a5df2ec657e01000000 Worker ID: 5ccbfa3ff5230781d3789772d63c943c304064420fbbc25bc4db9190 Node ID: 575f423ffa24c16f9872628242726c2cc3a27ccb1d9e691563ac9a22 Worker IP address: 127.0.0.1 Worker port: 53604 Worker PID: 216
2021-11-02 10:38:44,790 WARNING worker.py:1239 -- A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. RayTask ID: ffffffffffffffffd74d17c6d11e5f497765e7d001000000 Worker ID: a103a41221f9fb24398c1bae5d632c0db819ca9a25966c3edaed08ab Node ID: f5036d1f1684f4ddfd568cd76dd63289447cb2781a8d28449d908ab8 Worker IP address: 127.0.0.1 Worker port: 53269 Worker PID: 11556
     self.metrics_agent_port = self._get_cached_port(
ray_start_cluster(pid=None)   File "c:\users\gagan\ray_project\ray\python\ray\node.py", line 669, in _get_cached_port
"(pid=None), [{
            y_node.update(json.load(f))
"(pid=None)   File "c:\programdata\anaconda3\envs\ray_dev\lib\json\__init__.py", line 293, in load
num_cpus(pid=None)     return loads(fp.read(),
"(pid=None)   File "c:\programdata\anaconda3\envs\ray_dev\lib\json\__init__.py", line 357, in loads
: 4(pid=None)     return _default_decoder.decode(s)
,
            (pid=None)"   File "c:\programdata\anaconda3\envs\ray_dev\lib\json\decoder.py", line 337, in decode
(pid=None)num_nodes     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
"(pid=None)   File "c:\programdata\anaconda3\envs\ray_dev\lib\json\decoder.py", line 355, in raw_decode
: (pid=None)3     raise JSONDecodeError("Expecting value", s, err.value) from None
,(pid=None)
            r.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
do_initone) Traceback (most recent call last):
(pid=None)"   File "c:\users\gagan\ray_project\ray\python\ray\workers/default_worker.py", line 185, in <module>
: (pid=None)True     node = ray.node.Node(

(pid=None)        }],   File "c:\users\gagan\ray_project\ray\python\ray\node.py", line 221, in __init__

)       indirect=True(pid=None)     self.metrics_agent_port = self._get_cached_port(

    (pid=None)def   File "c:\users\gagan\ray_project\ray\python\ray\node.py", line 669, in _get_cached_port
 (pid=None)test_actor_creation_node_failure     ports_by_node.update(json.load(f))
(ray_start_cluster):(pid=None)
           File "c:\programdata\anaconda3\envs\ray_dev\lib\json\__init__.py", line 293, in load
# TODO(swang): Refactor test_raylet_failed, etc to reuse the below code.(pid=None)
        cluster = ray_start_cluster


(pid=None)           File "c:\programdata\anaconda3\envs\ray_dev\lib\json\__init__.py", line 357,@rayloads
(pid=None).remote
        urn _default_decoder.decode(s)
class(pid=None)   File "c:\programdata\anaconda3\envs\ray_dev\lib\json\decoder.py", line 337, in decode
 Child(pid=None)     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
:(pid=None)
               File "c:\programdata\anaconda3\envs\ray_dev\lib\json\decoder.py", line 355, in raw_decode
def(pid=None)      raise JSONDecodeError("Expecting value", s, err.value) from None
(_init__(pid=None) json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
 pid=11284), death_probability): 2021-11-02 10:14:36,937        ERROR worker.py:426 -- SystemExit was raised from the worker

 pid=11284) Traceback (most recent call last):
.death_probability = death_probability
 pid=11284)
               File "python\ray\_raylet.pyx", line 757, in ray._raylet.task_execution_handler
 pid=11284)      execute_task(task_type, task_name, ray_function, c_resources,
 pid=11284)(   File "python\ray\_raylet.pyx", line 580, in ray._raylet.execute_task
 pid=11284)):     with core_worker.profile_event(b"task", extra_data=extra_data):

 pid=11284)   File "python\ray\_raylet.pyx", line 618, in ray._raylet.execute_task

                exit_chance = np.random.rand()
ifid=11284)                     with core_worker.profile_event(b"task:execute"):
 pid=11284) exit_chance < self   File "python\ray\_raylet.pyx", line 625, in ray._raylet.execute_task
.death_probability:
 pid=11284)                    sys.exit(-     with ray.worker._changeproctitle(title, next_title)1
 pid=11284))
    ile "python\ray\_raylet.pyx", line 629, in ray._raylet.execute_task

 pid=11284)25     outputs = function_executor(*args, **kwargs)

 pid=11284)# Children actors will die about half the time.   File "python\ray\_raylet.pyx", line 578, in ray._raylet.execute_task.function_executor

0.5d=11284)        death_probability =      return function(actor, *arguments, **kwarguments)
 pid=11284)
       File "c:\users\gagan\ray_project\ray\python\ray\_private\function_manager.py", line 594, in actor_method_executor

 pid=11284)     return method(__ray_actor, *args, **kwargs)ld
 pid=11284)   File "c:\users\gagan\ray_project\ray\python\ray\util\tracing\tracing_helper.py", li e 451, in _resume_span
(num_children)] return method(self, *_args, **_kwargs)

 pid=11284)   File "C:\Users\gagan\ray_project\ray\python\ray\tests\test_multinode_failures_2.py", line 95, in ping
 pid=11284)len     sys.exit(-1)
 pid=11284)1 SystemExit: -1> (Child
 pid=8784):
            10:14:37,718        ERROR worker.py:426 -- SystemExit was raised from the worker
inid=8784) j  Traceback (most recent call last):
 pid=8784)range   File "python\ray\_raylet.pyx", line 757, in ray._raylet.task_execution_handler
 pid=8784)     execute_task(task_type, task_name, ray_function, c_resources,
):
 pid=8784)                   File "python\ray\_raylet.pyx", line 580, in ray._raylet.execute_task# Submit some tasks on the actors. About half of the actors will
 pid=8784)
# fail.              with core_worker.profile_event(b"task", extra_data=extra_data):

 pid=8784)for   File "python\ray\_raylet.pyx", line 618, in ray._raylet.execute_task
 pid=8784)in     with core_worker.profile_event(b"task:execute"):
 children]
 pid=8784)                   File "python\ray\_raylet.pyx", line 625, in ray._raylet.execute_task# Wait a while for all the tasks to complete. This should trigger
 pid=8784)
                     with ray.worker._changeproctitle(title, next_title):
 pid=8784)   File "python\ray\_raylet.pyx", line 629, in ray._raylet.execute_task

 pid=8784)# to nodes that then failed.     outputs = function_executor(*args, **kwargs)

                ready, _ = ray.wait(
 pid=8784)                    children_out, num_returns=   File "python\ray\_raylet.pyx", line 57lenin ray._raylet.execute_task.function_executor
5pid=8784)(children_out), timeout=     return function(actor, *arguments, **kwarguments)
 pid=8784) *    File "c:\users\gagan\ray_project\ray\python\ray\_private\function_manager.py", line 594, in actor_method_executor
)pid=8784)     return method(__ray_actor, *args, **kwargs)

 pid=8784)assert   File "c:\users\gagan\ray_project\ray\python\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
 pid=8784)len     return method(self, *_args, **_kwargs)
 pid=8784)len   File "C:\Users\gagan\ray_project\ray\python\ray\tests\test_multinode_failures_2.py", line 95, in ping
 pid=8784)out)(Child
     sys.exit(-1)
E               assert 14 == 25
E                 +14: -1

 pid=8512)E                 -25 2021-11-02 10:14:37,846 ERROR worker.py:426 -- SystemExit was raised from the worker


 pid=8512)python\ray\tests\test_multinode_failures_2.py Traceback (most recent call last):
:112: AssertionError
 pid=8512)
   File "python\ray\_raylet.pyx", line 757, in ray._raylet.task_execution_handler
 pid=8512)     execute_task(task_type, task_name, ray_function, c_resources,
 pid=8512)   File "python\ray\_raylet.pyx", line 580, in ray._raylet.execute_task
 pid=8512)     with core_worker.profile_event(b"task", extra_data=extra_data):
 pid=8512)   File "python\ray\_raylet.pyx", line 618, in ray._raylet.execute_task
 pid=8512)     with core_worker.profile_event(b"task:execute"):
 pid=8512)   File "python\ray\_raylet.pyx", line 625, in ray._raylet.execute_task
 pid=8512)     with ray.worker._changeproctitle(title, next_title):
 pid=8512)   File "python\ray\_raylet.pyx", line 629, in ray._raylet.execute_task
 pid=8512)     outputs = function_executor(*args, **kwargs)
 pid=8512)   File "python\ray\_raylet.pyx", line 578, in ray._raylet.execute_task.function_executor
 pid=8512)     return function(actor, *arguments, **kwarguments)
 pid=8512)   File "c:\users\gagan\ray_project\ray\python\ray\_private\function_manager.py", line 594, in actor_method_executor
 pid=8512)     return method(__ray_actor, *args, **kwargs)
 pid=8512)   File "c:\users\gagan\ray_project\ray\python\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
 pid=8512)     return method(self, *_args, **_kwargs)
 pid=8512)   File "C:\Users\gagan\ray_project\ray\python\ray\tests\test_multinode_failures_2.py", line 95, in ping
 pid=8512)     sys.exit(-1)
 pid=8512) SystemExit: -1
 ray\tests\test_multinode_failures_2.py::test_actor_creation_node_failure[ray_start_cluster0] ⨯100% ██████████
=================================== short test summary info ====================================
FAILED python\ray\tests\test_multinode_failures_2.py::test_actor_creation_node_failure[ray_start_cluster0]

Results (1478.75s):
       1 failed
         - ray\tests/test_multinode_failures_2.py:75 test_actor_creation_node_failure[ray_start_cluster0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment