Skip to content

Instantly share code, notes, and snippets.

@lucasw
Last active December 16, 2019 15:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lucasw/6471ab49097c0b9f259369573882bf10 to your computer and use it in GitHub Desktop.
Save lucasw/6471ab49097c0b9f259369573882bf10 to your computer and use it in GitHub Desktop.
Presentation on experiences with ros2

ROS2 - October 2018 - March 2019

There and back again

Started using ROS2 toward end of bouncy release, upgraded to crystal shortly after release, now have returned to ros1.

Maybe will try again when Dashing is release (June 2019?), or will wait for certain key features (some of these maybe already have been achieved, or were already achieved when I was using it but the right way wasn't clear to me):

  • ros2 topic echo /image_raw shouldn't take 100% cpu, rqt_image_view shouldn't take 50% cpu. This may be fixed with https://github.com/ros2/rosidl_python/pull/ 35
  • ros2 node info should handle namespaces
  • dead nodes (and their topics) should die when their processes die, not be listed in ros2 node/topic list. Shouldn't have to kill the ros2_daemon to clear them out.
  • ros2 node list shouldn't show multiple entries for same named nodes (or were some of those dead nodes from previous cycles)
  • Composition - ability to load nodes and parameters into any namespace.
  • Launch, support for composition. Expect launch to look very different when I revisit ros2 (maybe they'll have brought back xml), don't want to continually relearn and rewrite launch files.

Longer term issues that may not be addressed any time soon:

  • python nodes seem very cpu hungry- they already were on the high side in ros1, now seem much worse.
  • CPU resources even in C++ nodes seem high in many situations (TODO apples-apples comparisons)
  • Responsiveness of localhost systems in general, command line tools
  • sourcing setup.bash takes a long time, avoid sourcing it in .bashrc
  • localhost inter-process communications - dds doesn't seem to be very good at this vs. TCP.
  • colcon build seems slower (TODO need to test that), clunkier than catkin build. catkin build foo and rebuild it and run it vs. colcon build --mixin my_release --packages-select foo

going off the rails

Ran into performance issues with megapixel image pipelines at modest framerates. lucasw/ros2_cpp_py#3 and https://answers.ros.org/question/312964/ros2-megapixel-image-pubsub-cpu-usage-is-very-high/

There is some capability analogous to ros1 nodelets to publish unique_ptr rather than shared_ptr, and not be copied if subscribing node is in same process. Also non-default dds has shared memory capability? Didn't investigate either too much- the unique_ptr still limited to 1:1 pipeline, 1 subscriber for 1 publisher.

Made own internal publish susbscribe system to bypass ros2 publishing, relatively easy to toggle ros2 publishing on certain topics when desired. This worked fine for single node, then started looking into composition.

Had to subclass rclcpp::Node, convert all necessary nodes that dealt with images to use this internal pub sub.

Then had to fork rclcpp Node so that namespace and parameter and remappings could be used.

Starting to become unclear on what of ros2 was still be used- services (though they sometimes were unreliable? rtsp bug about that), rviz some. But I would have to fork that to avoid huge rviz cpu usage. (Though there are many situations in ros1 where rviz cpu usage seems higher than it ought to).

Ran into a bug involving the forked rclcpp support for namespaces in composition nodes- support is non-existent in that realm, re-evaluated the past few months of experiences, and went back to ros1.

other

6 month intensive release cycle:

  • people most knowledgeable about current release are far beyond it in development of next release.
  • Not a lot of time left for ros2 questions on answers.ros.org.
  • Lots of breakage between releases - better to wait for stability.
  • Time spent on workarounds for current versions quickly made redundant if issue is fixed at source.
  • New OSRF staff taking on workload from founders, some learning curve there. (Is force push into pull request standard practice anywhere? Can be disabled in github, make a backup of a branch you want to be able to continue to use, use the backup in the pr.)

Presentations from companies like Cruise automation (worth billions of dollars and having hundreds of engineers) (TODO link to the roscon 2018 video, verify it was Cruise and that they were using ros2) cherry picking what the want from ros2 and re-writing all the rest aren't encouraging for us with much more limited resources.

Big companies (Amazon, Microsoft, Google) are funding ros2 but not clear whether they are actually using it.

image performance issues with ros1

https://answers.ros.org/question/232919/image-related-nodes-eating-up-majority-of-cpu-until-system-becomes-unresponsive/

Whatever that was seemed to go away.

https://answers.ros.org/question/219510/ros-traffic-over-gigabit-connection-is-renegotiated-to-100-mbps/

I believe this was exactly the sort of problem ros2 and dds was meant to address- an underperforming network link.

But I don't want to compromise all the local traffic because of that link, I just want better management in that weak link- dds on where I want it, and localhost TCP where I don't, and transparent node composition to outdo localhost TCP.

DDS

The issue with lossy networks and ROS 1 was that it used TCP almost exclusively, and if you lost data, TCP would try to resend it, which would further stress the network and you could end up saturating the network and not even keeping up at all. Especially since the common use case for this was streaming some sensor data over wifi to a workstation to visualize it in rviz, in which case you don't care if you miss a few messages. ROS 1 does have a UDP transport, but it had several issues, for example being unreliable for large data and not being supported uniformly (python never supported it).

DDS has unreliable and reliable communication and graceful degradation, i.e. a reliable publisher can send data to an unreliable subscriber (but not the other way around). But more importantly, DDS's reliable communication happens over UDP with a custom protocol on top (DDSI-RTPS), which has the advantage over TCP that you can control things like how long it will retry to send data, how long it will wait for a NAK, how it will buffer data before sending (like Nagle's algorithm), etc...

Basically, the idea is that DDS's configuration options allow it to be many things between TCP and simple UDP, including a more flexible version of TCP, which in turn allows you to fine tune your communication settings to better work on lossy networks.

This comes at the cost of complexity and some performance (TCP on the local host is really good), but should allow knowledgeable users to get good results in more situations.

https://answers.ros.org/question/319218/how-does-ros2-implement-its-network-design/

@lucasw
Copy link
Author

lucasw commented Mar 24, 2019

Imgui instead of qt

There are some other interesting guis- nuklear and others in the 'single header file library' school.

Personally don't like using qt, aesthetic mismatch, so trying Imgui https://github.com/ocornut/imgui - 'feels like programming', doesn't have kitchen sink approach. Some inefficiencies when nothing is happening.

It's a single developer project (but maybe with enough Patreon supporters, or more usage...), which is inspirational if you are ever a lone developer, not as appealing if you are an institution and prefer institution driven solutions.

It's heavily oriented toward developer oriented gui, rather than customer facing, but everyone uses web tools for the latter anyhow- the qt ros tools exist for the developers using ros.

@lucasw
Copy link
Author

lucasw commented Mar 25, 2019

ROS nodes and topics on a GPU?

Also using OpenGL and glsl a lot here.
If an image is mostly getting processed on one system and is going to end up being copied to the gpu for display, why not copy it there earlier and do entire image processing pipeline there?
Would probably want to use OpenCL though, and would need means of moving data from OpenCL to OpenGL/glsl without copying back to cpu.

Haven't pursued this- maybe will wait for Vulkan, which means when I have a laptop with a vulkan friendly gpu.

DMA bytes directly from ethernet nic to graphics memory?

Nvidia is probably already there with their robotics middleware, but of course all the typical problems with hardware vendor driven software- long term commitment may be lacking, limited adoption.

@lucasw
Copy link
Author

lucasw commented Mar 26, 2019

DDS

... Since DDS is implemented, by default, on UDP, it does not depend on a reliable transport or hardware for communication. This means that DDS has to reinvent the reliability wheel (basically TCP plus or minus some features), but in exchange DDS gains portability and control over the behavior. Control over several parameters of reliability, what DDS calls Quality of Service (QoS), gives maximum flexibility in controlling the behavior of communication. For example, if you are concerned about latency, like for soft real-time, you can basically tune DDS to be just a UDP blaster. In another scenario you might need something that behaves like TCP, but needs to be more tolerant to long dropouts, and with DDS all of these things can be controlled by changing the QoS parameters.

Though the default implementation of DDS is over UDP, and only requires that level of functionality from the transport, OMG also added support for DDS over TCP in version 1.2 of their specification. Only looking briefly, two of the vendors (RTI and PrismTech) both support DDS over TCP.

https://design.ros2.org/articles/ros_on_dds.html

Also look at:

http://community.rti.com/docs/html/tcp_transport/main.html#configuring

Efficient Transport Alternatives
In ROS 1.x there was never a standard shared-memory transport because it is negligibly faster than localhost TCP loop-back connections. It is possible to get non-trivial performance improvements from carefully doing zero-copy style shared-memory between processes, but anytime a task required faster than localhost TCP in ROS 1.x, nodelets were used. Nodelets allow publishers and subscribers to share data by passing around boost::shared_ptrs to messages. This intraprocess communication is almost certainly faster than any interprocess communication options and is orthogonal to the discussion of the network publish-subscribe implementation.

In the context of DDS, most vendors will optimize message traffic (even between processes) using shared-memory in a transparent way, only using the wire protocol and UDP sockets when leaving the localhost. This provides a considerable performance increase for DDS, whereas it did not for ROS 1.x, because the localhost networking optimization happens at the call to send. For ROS 1.x the process was: serialize the message into one large buffer, call TCP’s send on the buffer once. For DDS the process would be more like: serialize the message, break the message into potentially many UDP packets, call UDP’s send many times. In this way sending many UDP datagrams does not benefit from the same speed up as one large TCP send. Therefore, many DDS vendors will short circuit this process for localhost messages and use a blackboard style shared-memory mechanism to communicate efficiently between processes.

'most vendors will optimize message traffic', but not the ros2 default fastrtps?

Tried out connext (which performed much worse) and opensplice (about the same as fastrtps) but no automatic transparent shared memory or any other apparent performance gains.

However, not all DDS vendors are the same in this respect, so ROS would not rely on this “intelligent” behavior for efficient intraprocess communication. Additionally, if the ROS message format is kept, which is discussed in the next section, it would not be possible to prevent a conversion to the DDS message type for intraprocess topics. Therefore a custom intraprocess communication system would need to be developed for ROS which would never serialize nor convert messages, but instead would pass pointers (to shared in-process memory) between publishers and subscribers using DDS topics. This same intraprocess communication mechanism would be needed for a custom middleware built on ZeroMQ, for example.

The point to take away here is that efficient intraprocess communication will be addressed regardless of the network/interprocess implementation of the middleware.

Addressing only intraprocess leaves out command line tools like ros2 topic echo, rqt gui tools, and rviz- those seem to suffer greatly from having to use the default dds with default settings, so it seems like efficient interprocess needs to be on an equal footing with intraprocess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment