This note contains information on performance evaluation of the Kudu C++ client library. The result measurements show how the new version of the library performs in simplistic write-only bulk workloads. The performance measurements include results for the previos version of the library and the Kudu Java client library as well.
This performance evaluation uses a very simple 'push-as-much-data-as-possible' scenario: the client generates and sends data to the server as fast as it can. The client and the server run at the same machine and communicate via the loopback network interface. The idea is to quickly pinpoint issues (if any) in the new code for the use case of the maximum possible throughput to a single tablet server. The simpliest max-throughput scenario allows to focus on stress testing of the write data path and measure the maximum achievable data throughput.
The Kudu server-side uses simpliest configuration: single master and single tablet server.
The old and the new versions of the Kudu client C library have been compiled in release configuration. The C test application is compiled in release configuration and linked with the Kudu client C++ library. The server-side components have been compiled in release configuration as well.
For every iteration, a table with the following colums is created prior to running the test:
-
"key" of type INT32, primary key
-
"int_val" of type INT32, not null
-
"string_val" of type STRING, not null
-
"long_string_val" of type STRING
-
"non_null_with_default": of type INT32, not null, default 12345
The test application fills in all integer columns with sequenced numbers (no duplicate values). The columns of the string type are populated with pre-set string of 32 bytes. The raw (wire) size of the resulting write operation is between 115 and 118 bytes.
The code of the C++ client application used for performance comparison can be found at https://gerrit.cloudera.org/#/c/4412/
All builds and measurements were performed at ve0518.halxg.cloudera.com. The machine was not busy with any other significant load (compilation, other tests, puppet activity, etc.) while running tests. At least 3 runs (usually 5) were performed for every set of configuration parameters. The final result value in a 'Time total'/'Time per row' cell is the average accross corresponding runs.
This measurement is to ensure that the performance of the Kudu C++ client library did not degrade due to the introduction of the AUTO_FLUSH_BACKGROUND functionality.
The table below contains results of performance measurements for the old and the new C++ client library (i.e. prior and after AUTO_FLUSH_BACKGROUND-related changes).
Source code | Total buffer size (MiB) | Batcher size (MiB) | Max number of batchers | CPU usage insert_loadgen | CPU usage kudu-tserver | Time total (ms) | Time per row (ms) |
---|---|---|---|---|---|---|---|
OLD |
unlimited |
7 |
unlimited |
129.5% |
845.6% |
11757.8 |
0.001469 |
NEW |
128 |
n/a |
unlimited |
136.7% |
588.6% |
12057.3 |
0.001507 |
NEW |
7 |
n/a |
16 |
119.2% |
282.2% |
12098.1 |
0.001512 |
NEW |
7 |
n/a |
2 |
74.6% |
152.2% |
19034.6 |
0.002379 |
The slight (about 2.5%) performance drop of the new version is being addressed.
The table below contain information on measured performance of the new code. The paired tests allow to compare performance of Kudu client sessions running in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.
In the table, the value of the 'Flush every N rows' column is 'n/a' for tests run in AUTO_FLUSH_BACKGROUND mode. The column’s values for the MANUAL_MODE are set to correspond to the buffer size limit and the flush watermark of the pairing test run in AUTO_FLUSH_BACKGROUND mode.
Total buffer size (MiB) | Max number of batchers | Buffer flush watermark | Flush every N rows | CPU usage insert_loadgen | CPU usage kudu-tserver | Time total (ms) | Time per row (ms) |
---|---|---|---|---|---|---|---|
7 |
2 |
n/a |
31000 |
115.1% |
117.4% |
23749.3 |
0.002968 |
7 |
2 |
50% |
n/a |
116.1% |
108.1% |
24353.6 |
0.003044 |
7 |
4 |
n/a |
15500 |
116.4% |
114.0% |
22736.6 |
0.002842 |
7 |
4 |
25% |
n/a |
116.7% |
113.0% |
23817.7 |
0.002977 |
7 |
4 |
n/a |
1000 |
118.4% |
296.7% |
13203.1 |
0.001650 |
7 |
4 |
25% |
n/a |
131.3% |
271.8% |
11250.9 |
0.001406 |
The difference in performance can be explained the following way: with comparable parameters, a session in AUTO_FLUSH_BACKGROUND mode is a bit less performant compared with a session in MANUAL_FLUSH mode because more checks are executed and more synchronization is involved in the hot paths in the case of AUTO_FLUSH_BACKGROUND mode.
Apparently, a session in AUTO_FLUSH_BACKGROUND mode performs better than a session in MANUAL_FLUSH mode when its buffer is able to accomodate substantially more operations than a session in MANUAL_FLUSH mode flushes at once.
The table below contains information on measured performance of the Kudu C and Java client test applications running in AUTO_FLUSH_BACKGROUND mode. The Java client code is slightly modified version of the InsertLoadgen from kudu-examples. The minor modifications are to be a better match for the C test application and to sqeeze more performance from the code (the original InsertLoadgen example performs about 10 times slower in the described scenario).
The Kudu’s Java client API does not allow to specify limit on the buffer size as amount of memory used by serialized write operations. Instead, it uses the notion of the maximum number of write operations to buffer. By default, the limit is set to 1000 operations. Given the size of the serialized write operations used throughout the tests, set the size of the buffer for the C client accordingly. The idea is to have the same limit on the buffer size both for the C and the Java client to perform 'apples-to-apples' comparison. Besides the 'apples-to-apples' comparison, every table contains two more rows with additional measurements for the C++ test application:
-
twice increased limit on the maximum number of batchers and twice lower flush watermark
-
default set of buffering/batching parameters for the Kudu C++ client library
Total buffer limit | Max number of batchers | Buffer flush watermark | Type of client | CPU usage insert_loadgen | CPU usage kudu-tserver | Time total (ms) | Time per row (ms) |
---|---|---|---|---|---|---|---|
1000 ops |
2 |
50 |
Java |
80.5% |
82.2% |
34316.5 |
0.004290 |
117000 Bytes |
2 |
50 |
C++ |
108.9% |
116.2% |
29055.6 |
0.003631 |
117000 Bytes |
4 |
25 |
C++ |
115.7% |
150.1% |
27724.7 |
0.003393 |
7MiB |
2 |
50 |
C++ |
114.1% |
110.5% |
23909.8 |
0.002988 |
Total buffer limit | Max number of batchers | Buffer flush watermark | Type of client | CPU usage insert_loadgen | CPU usage kudu-tserver | Time total (ms) | Time per row (ms) |
---|---|---|---|---|---|---|---|
1000 ops |
2 |
50 |
Java |
93.9% |
196.8% |
24687 |
0.003086 |
117000 Bytes |
2 |
50 |
C++ |
192.2% |
210.4% |
21154.9 |
0.002644 |
117000 Bytes |
4 |
25 |
C++ |
217.5% |
312.7% |
17524.5 |
0.002190 |
7MiB |
2 |
50 |
C++ |
220.1% |
210.5% |
18405.5 |
0.002300 |
Total buffer limit | Max number of batchers | Buffer flush watermark | Type of client | CPU usage insert_loadgen | CPU usage kudu-tserver | Time total (ms) | Time per row (ms) |
---|---|---|---|---|---|---|---|
1000 ops |
2 |
50 |
Java |
149.4% |
308.8% |
15175.3 |
0.001897 |
117000 Bytes |
2 |
50 |
C++ |
378.6% |
353.2% |
14360.3 |
0.001795 |
117000 Bytes |
4 |
25 |
C++ |
422.6% |
437.2% |
13229.6 |
0.001653 |
7MiB |
2 |
50 |
C++ |
398.7% |
355.9% |
12620.8 |
0.001577 |