Skip to content

Instantly share code, notes, and snippets.

@audrow
Last active January 7, 2022 01:57
Show Gist options
  • Save audrow/4d0333094676fc21ad8609e76c30cf1f to your computer and use it in GitHub Desktop.
Save audrow/4d0333094676fc21ad8609e76c30cf1f to your computer and use it in GitHub Desktop.
Working to fix error created by switching ROS 2's default DDS. #open-robotics #active

Debugging RMW Errors

ROS 2 Topic Bad QoS settings

Tried understanding what's going on

  • Easiest way to run the test is to delete all other test files and run

    colcon test --packages-select ros2topic --event-handlers console_direct+
    
  • How the test works

    • It seems that the test just looks at the output of the command. It seems that the test expects the command to be empty but that it has failed to shutdown text for output
      Traceback (most recent call last):
      File "/home/audrow/Documents/ros2_rolling/src/ros2/ros2cli/ros2topic/test/test_echo_pub.py", line 171, in test_echo_basic
        assert not command.output, (
      AssertionError: ('Echo CLI should not have received anything with incompatible QoS', 'failed to shutdown: rcl_shutdown already called on the given context, at /home/audrow/Documents/ros2_rolling/src/ros2/rcl/rcl/src/rcl/init.c:241
      ')
      assert not 'failed to shutdown: rcl_shutdown already called on the given context, at /home/audrow/Documents/ros2_rolling/src/ros2/rcl/rcl/src/rcl/init.c:241\n'
      +  where 'failed to shutdown: rcl_shutdown already called on the given context, at /home/audrow/Documents/ros2_rolling/src/ros2/rcl/rcl/src/rcl/init.c:241\n' = <launch_testing.tools.process.ProcessProxy object at 0x7fabcc39b940>.output
      
  • Questions

    • Why is this not occurring for other tests?
    • What happened since foxy?
  • Trying on Foxy

    • Using Docker
      docker pull osrf/ros:foxy-desktop
      
    • The test passes on foxy (pulled docker, cloned repo, built workspace, deleted all but relevant test, ran test)
    • Diffed the tests
      • Seems that adding message lost was added - tried commenting this out, but test still fails with the same error
    • Maybe see what's changed since foxy in ros2topic
      • A lot has changed since foxy
  • Tried creating publisher and subscriber on my local machine to see if the same thing was happening

    • Created a publisher with the following QOS profile
      QoSProfile(
        depth=10,
        reliability=ReliabilityPolicy.BEST_EFFORT,
        durability=DurabilityPolicy.VOLATILE)
      
    • Echo, like the test command, for publisher topic and message type
      ros2 topic echo --qos-reliability reliable --qos-durability transient_local /chatter  std_msgs/String
      
    • Hit ctrl-c and I see the suspect line:
      failed to shutdown: rcl_shutdown already called on the given context, at /home/audrow/Documents/ros2_rolling/src/ros2/rcl/rcl/src/rcl/init.c:241
      
  • Test that I can do this without running the publisher

    • Confirmed
      • Note, if I run echo for a couple of seconds it produces the line
      • If I run echo for very short time, it doesn't produce the line
      • Note that I also see the line when running without QoS options
        ros2 topic echo /foo std_msgs/String
        
  • I don't see the line when running the same command in foxy

  • Let's see about galactic

    • No problem in galactic
  • I also see this problem with ros2 topic bw /foo!

2021-12-20

  • Digging into it
    • Can I create a minimal example? For example, using ros2/ros2cli/ros2topic/ros2topic/verb/bw.py
      • From reproducing in a python file and calling directly, I get this error with no publisher topic:
        WARNING: topic [foo] does not appear to be published yet
        ^CTraceback (most recent call last):
          File "reproduce.py", line 146, in <module>
            _rostopic_bw(node.node, topic)
          File "reproduce.py", line 118, in _rostopic_bw
            msg_class = get_msg_class(node, topic, blocking=True, include_hidden_topics=True)
          File "/home/audrow/Documents/ros2_rolling/build/ros2topic/ros2topic/api/__init__.py", line 108, in get_msg_class
            sleep(0.1)
        KeyboardInterrupt
        
        During handling of the above exception, another exception occurred:
        
        Traceback (most recent call last):
          File "reproduce.py", line 146, in <module>
            _rostopic_bw(node.node, topic)
          File "/home/audrow/Documents/ros2_rolling/build/ros2cli/ros2cli/node/direct.py", line 87, in __exit__
            rclpy.shutdown()
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/__init__.py", line 123, in shutdown
            _shutdown(context=context)
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/utilities.py", line 58, in shutdown
            return context.shutdown()
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/context.py", line 102, in shutdown
            self.__context.shutdown()
        rclpy._rclpy_pybind11.RCLError: failed to shutdown: rcl_shutdown already called on the given context, at /home/audrow/Documents/ros2_rolling/src/ros2/rcl/rcl/src/rcl/init.c:241
        
        And this message with a publisher topic:
        Subscribed to [chatter]
        31 B/s from 2 messages
                Message size mean: 24 B min: 24 B max: 24 B
        28 B/s from 3 messages
                Message size mean: 24 B min: 24 B max: 24 B
        27 B/s from 4 messages
                Message size mean: 24 B min: 24 B max: 24 B
        ^CTraceback (most recent call last):
          File "reproduce.py", line 146, in <module>
            _rostopic_bw(node.node, topic)
          File "reproduce.py", line 135, in _rostopic_bw
            rclpy.spin_once(node)
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/__init__.py", line 199, in spin_once
            executor.spin_once(timeout_sec=timeout_sec)
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/executors.py", line 704, in spin_once
            handler, entity, node = self.wait_for_ready_callbacks(timeout_sec=timeout_sec)
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/executors.py", line 690, in wait_for_ready_callbacks
            return next(self._cb_iter)
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/executors.py", line 588, in _wait_for_ready_callbacks
            wait_set.wait(timeout_nsec)
        KeyboardInterrupt
        
        During handling of the above exception, another exception occurred:
        
        Traceback (most recent call last):
          File "reproduce.py", line 146, in <module>
            _rostopic_bw(node.node, topic)
          File "/home/audrow/Documents/ros2_rolling/build/ros2cli/ros2cli/node/direct.py", line 87, in __exit__
            rclpy.shutdown()
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/__init__.py", line 123, in shutdown
            _shutdown(context=context)
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/utilities.py", line 58, in shutdown
            return context.shutdown()
          File "/home/audrow/Documents/ros2_rolling/install/rclpy/lib/python3.8/site-packages/rclpy/context.py", line 102, in shutdown
            self.__context.shutdown()
        rclpy._rclpy_pybind11.RCLError: failed to shutdown: rcl_shutdown already called on the given context, at /home/audrow/Documents/ros2_rolling/src/ros2/rcl/rcl/src/rcl/init.c:241
        
      • It seems that I can get the same error with the following code:
        import rclpy
        import sys
        from ros2cli.node.direct import DirectNode
        
        if __name__ == '__main__':
          args = sys.argv[1:]
          with DirectNode(args) as node:
            while rclpy.ok():
              rclpy.spin_once(node)
      • Cannot reproduce with regular nodes
      • Cannot reproduce it with direct nodes without with context
      • I can actually produce it with the following and it seems to be blocking
        with DirectNode(args) as node:
          rclpy.spin_once(node)
        
      • Works with the following
        import sys
        from ros2cli.node.direct import DirectNode
        
        args = sys.argv[1:]
        
        with DirectNode(args) as node:
          from rclpy.executors import SingleThreadedExecutor
          executor = SingleThreadedExecutor()
          executor.add_node(node)
          executor.spin_once(timeout_sec=None)
      • Exits silently when timeout works (e.g., not None)
      • I tried to reproduce with cyclone and was able to. Let me try in a Docker container as maybe I'm doing something weird with building
      • Running with Galactic
        ^CTraceback (most recent call last):
          File "repro.py", line 12, in <module>
            executor.spin_once(timeout_sec=None)
          File "/opt/ros/galactic/lib/python3.8/site-packages/rclpy/executors.py", line 699, in spin_once
            handler, entity, node = self.wait_for_ready_callbacks(timeout_sec=timeout_sec)
          File "/opt/ros/galactic/lib/python3.8/site-packages/rclpy/executors.py", line 685, in wait_for_ready_callbacks
            return next(self._cb_iter)
          File "/opt/ros/galactic/lib/python3.8/site-packages/rclpy/executors.py", line 583, in _wait_for_ready_callbacks
            _rclpy.rclpy_wait(wait_set, timeout_nsec)
        KeyboardInterrupt
        
      • Doesn't seem to work outside of context
        node = DirectNode(args)
        from rclpy.executors import SingleThreadedExecutor
        executor = SingleThreadedExecutor()
        executor.add_node(node)
        executor.spin_once(timeout_sec=None)
        node.node.destroy_node()
        rclpy.shutdown()
      • Ivan fixed with ros2/ros2cli#683, which revealed more issues

        After #683 there's a new issue, some tests fail because we're calling init() multiple times in the same context (e.g. try colcon test --packages-select ros2cli)

        • Once this is good, I'll rerun the tests and merge the DDS PRs in.

Unknown Windows Error

2022-01-06

  • Fail message: https://ci.ros2.org/job/ci_windows/16176/console
    Summary: 326 packages finished [1h 45min 13s]
      1 package failed: tf2_ros_py
      77 packages had stderr output: ament_clang_format ament_clang_tidy ament_copyright ament_cpplint ament_flake8 ament_index_python ament_lint ament_lint_cmake ament_mypy ament_pclint ament_pep257 ament_uncrustify ament_xmllint demo_nodes_py domain_coordinator examples_rclpy_executors examples_rclpy_guard_conditions examples_rclpy_minimal_action_client examples_rclpy_minimal_action_server examples_rclpy_minimal_client examples_rclpy_minimal_publisher examples_rclpy_minimal_service examples_rclpy_minimal_subscriber examples_rclpy_pointcloud_publisher examples_tf2_py launch launch_pytest launch_ros launch_testing launch_testing_examples launch_testing_ros launch_xml launch_yaml osrf_pycommon quality_of_service_demo_py rclpy ros2bag ros2cli ros2component ros2doctor ros2launch ros2multicast ros2param ros2run ros2test ros2trace rosbag2_transport rosidl_cli rosidl_runtime_py rpyutils rqt rqt_action rqt_bag rqt_bag_plugins rqt_console rqt_gui rqt_gui_py rqt_msg rqt_plot rqt_publisher rqt_py_console rqt_reconfigure rqt_service_caller rqt_shell rqt_srv rqt_top rqt_topic sensor_msgs_py sros2 test_launch_ros test_tracetools_launch tf2_ros_py topic_monitor tracetools_launch tracetools_read tracetools_test tracetools_trace
    
  • Error in tf2_ros_py
      Traceback (most recent call last):
    
        File "C:\Python38\lib\threading.py", line 932, in _bootstrap_inner
    
          self.run()
    
        File "C:\Python38\lib\threading.py", line 870, in run
    
          self._target(*self._args, **self._kwargs)
    
        File "C:\ci\ws\src\ros2\geometry2\tf2_ros_py\tf2_ros\buffer_client.py", line 229, in unblock_by_timeout
    
          if clock.now() > start_time + timeout + timeout_padding:
    
      TypeError: unsupported operand type(s) for +: 'Time' and 'float'
    
    test/test_buffer_client.py::TestBufferClient::test_lookup_transform_true
    
    • Not supporting + for Time and float
  • Looking at the file: https://github.com/ros2/geometry2/blob/7a660094d0da9c463be5fea1df60d836b3349e9b/tf2_ros_py/tf2_ros/buffer_client.py#L229
    if clock.now() > start_time + timeout + timeout_padding:
    # makes TypeError: unsupported operand type(s) for +: 'Time' and 'float'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment