We are studying existing measurement projects especially in terms of real-time.
- [1] CI: buildfarm_perf_tests + Apex.AI performance_test (forked from https://gitlab.com/ApexAI/performance_test)
- results are aveilable at http://build.ros2.org/job/Fci__nightly-performance_ubuntu_focal_amd64/
- This has 3 tests. We refer only Test1 and Test2 because they are for communication measurement(separate columns in the table below).
- [2] iRobot ros2-performance
- result for crystal: https://github.com/irobot-ros/ros2-performance/tree/master/performances/experiments/crystal
- This can test a more complex situation than [1]. We can specify a network topology, the number of nodes and topics, and so on.
- The sample topology "sierra-nevada" has 10 nodes, 13 topics, and more publishers and subscribers. We refer to this as "SN" in [2] column.
- [3] pendulum_control
We have created 3 tables: one table for function comparison and two tables for metrics as described below.
- function comparison table
- metrics comparison table (behavior)
- metrics comparison table (resource)
We plan to post an article in ROS discourse about which policy is preferred for each item.
Roughly speaking, ROS2 program has following layers.
+----------------------------+
| Publisher / Subscription |
| rclcpp(Executor/Nodes) |
| DDS | ROS2 layer
+----------------------------+
+----------------------------+
| Process and RT-setting |
+----------------------------+
+----------------------------+
| HW / OS |
+----------------------------+
We separete table into some categires and subcategories according this.
No | Category | Subcategory | name | [1] Test1 | [1] Test2 | [2] | [3] |
---|---|---|---|---|---|---|---|
1 | HW/OS | kernel | RT_PREEMPTIVE patch | - | - | - | O |
2 | kernel thread | adjust CPU Core | - | - | - | - | |
3 | Process | common | duration | 30[s] ("--max_runtime 30" specified) | <- | 5 [s] (default) | 1000 or 7000 [s] (1M-7M cycles, 1kHz) |
4 | # of process | 1 | 2 | 1 process/1 json file | 1 for Realtime, 2 for non-Realtime | ||
5 | # of thread | 2 thread(main for statisics, child for pub/sub) | <- | option. We can separate threads per executors. | - | ||
6 | RT-setting | scheduling | scheduling policy | - (use_rt_prio does not look specified) | <- | CFS | SCHED_RR(but DDS threads are CFS) |
7 | CPU affinity | CPU affinity | - (use_rt_cpus does not look specified) | <- | CFS | ||
8 | memory | page fault guard | - | < | - | set by rttest_lock_and_prefault_dynamic() | |
9 | DDS/RMW | suppored RMW | connext, cyclone, fastrtps_{cpp,dynamic} | <- | use RMW_IMPLEMENTATION | openslice & connext are in README | |
10 | supported DDS(direct call) | Cyclone, FastRTPS | - | NA | NA | ||
11 | heterogeneous communicatoin | - | O(rmw_*) | undescribed(looks impossible because of 1 process) | undescribes(looks impossible) | ||
12 | rclcpp | init() option | (nothing) | ||||
13 | Executor | Executor class | SingleThreadedExecutor | <- | StaticSingleThreadedExecutor | RttExecutor (loop by clock_nanosleep) | |
14 | Node | # of Node | 1 | 1 per process | 10 in SN | 2 | |
15 | use_intra_process_comms | ON | <- | option | OFF | ||
16 | Communication detail | Communication style | 1-way/2-way | 1way | 2-way | more complex | controller and simulator |
17 | QoS | policy, depth | KeepALL. KeepLast(10) if topic >= 4mb | <- | KeepLast(10) | KeepLast(1) | |
18 | Reliability | Best Effort | <- | Reliable | BestEffort | ||
19 | Durability | volatile | <- | volatile | volatile # transient for setpoint | ||
20 | # of topics | # of topics | 13 (SN) | ||||
21 | data size pattern | data size pattern | 1,4,16,32,64,512K, 1,2,4,8,8M | <- | 8-250B, 1-600KB, 1,4,8 MB. almost -100 B(SN) | 64bit val * 10 = 80 Byte (roughly) | |
22 | Hz | Hz | 1000 | <- | 2, 10, 100 (SN) | 1000 | |
23 | Publisher / Timer | publisher | # of publishers | 1 | pub process only | many per node (over 10 in SN) | 3 |
24 | data | ptr_type | shared_ptr | <- | option (unique in default, shared in SN) | ConstSharedPtr | |
25 | data allocation | allocated first | <- | allocate in each loop | sensor, command: instance val. logger: local val | ||
26 | internal api (borrow etc) | (unknown) | <- | (unknown) | (unknown) | ||
27 | Timer | periodic wakeup mechanism | by std::thread::sleep_for |
<- | rclcpp::Node::create_wall_timer | clock_nanosleep(see Executor above) | |
28 | Subscription / Callback | subscriber | # of subscribers | 1 | sub process only | many per node (over 10 in SN) | 3 |
29 | spin | spin_once, calcurates statistics in each loop | <- | Executor::spin | see Executor above | ||
30 | data | ptr_type | shared_ptr | <- | option. not specified so shared_ptr(NV) | ConstSharedPtr | |
31 | data allocation(recv buf) | (unknown) | <- | (unknown) | MessagePoolMemoryStrategy | ||
32 | internal api (borrow etc) | (unknown) | <- | (unknown) | (unknown) | ||
33 | Other functions | Client/Server | has test? | - | <- | can select | - |
34 | Parameter | has test? | - | <- | - | - | |
35 | Action | has test? | - | <- | - | - | |
36 | Lifecycle | has test? | - | <- | x | - | |
37 | Measument | measurement under discovery | measurement under discovery | ignore first 3 seconds | <- | discovery wait | - |
38 | system stress | stress tool | ? | <- | ? | by stress command |
No | Category | [1] | [2] | [3] | |
---|---|---|---|---|---|
1 | communication quality | total sent | Test12 | O | - |
2 | total recv | Test12 | O | O | |
3 | losses | Test12 | O | - | |
4 | late/too_late | - | O | - | |
5 | trip time stats | Test12 | - | - | |
6 | program latency | PDP/EDP discovery | - | O | - |
7 | timer jitter(nanosleep) | - | - | O | |
8 | callback jitter | - | - | - |
No | Category | name | CI(g1+g3) | g2 | pendulum |
---|---|---|---|---|---|
1 | CPU | CPU Usage | Test123 | O | - |
2 | Memory | maxrss | Test12 | - | - |
3 | Phy | Test 23 | O | - | |
4 | RES | Test 23 | O(rss in resource) | - | |
5 | Virt | Test 23 | O(vsz in resource) | - | |
6 | arena | - | O | - | |
7 | in use | - | O | - | |
8 | mmap | - | O | - | |
9 | Page Fault | minor_pagefaults | - | - | O |
10 | major_pagefaults | - | - | O |