Skip to content

Instantly share code, notes, and snippets.

@alexeyserbin
Last active September 15, 2016 19:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alexeyserbin/35b9eac889c6f2586d58c1fe0c2b3afd to your computer and use it in GitHub Desktop.
Save alexeyserbin/35b9eac889c6f2586d58c1fe0c2b3afd to your computer and use it in GitHub Desktop.

Kudu C++ client library: performance comparision for AUTO_FLUSH_BACKGROUND mode

Summary

This note contains information on performance evaluation of the Kudu C++ client library. The result measurements show how the new version of the library performs in simplistic write-only bulk workloads. The performance measurements include results for the previos version of the library and the Kudu Java client library as well.

Introduction

This performance evaluation uses a very simple 'push-as-much-data-as-possible' scenario: the client generates and sends data to the server as fast as it can. The client and the server run at the same machine and communicate via the loopback network interface. The idea is to quickly pinpoint issues (if any) in the new code for the use case of the maximum possible throughput to a single tablet server. The simpliest max-throughput scenario allows to focus on stress testing of the write data path and measure the maximum achievable data throughput.

The Kudu server-side uses simpliest configuration: single master and single tablet server.

The old and the new versions of the Kudu client C library have been compiled in release configuration. The C test application is compiled in release configuration and linked with the Kudu client C++ library. The server-side components have been compiled in release configuration as well.

For every iteration, a table with the following colums is created prior to running the test:

  • "key" of type INT32, primary key

  • "int_val" of type INT32, not null

  • "string_val" of type STRING, not null

  • "long_string_val" of type STRING

  • "non_null_with_default": of type INT32, not null, default 12345

The test application fills in all integer columns with sequenced numbers (no duplicate values). The columns of the string type are populated with pre-set string of 32 bytes. The raw (wire) size of the resulting write operation is between 115 and 118 bytes.

The code of the C++ client application used for performance comparison can be found at https://gerrit.cloudera.org/#/c/4412/

All builds and measurements were performed at ve0518.halxg.cloudera.com. The machine was not busy with any other significant load (compilation, other tests, puppet activity, etc.) while running tests. At least 3 runs (usually 5) were performed for every set of configuration parameters. The final result value in a 'Time total'/'Time per row' cell is the average accross corresponding runs.

Comparision between the old and the new code for MANUAL_FLUSH mode

This measurement is to ensure that the performance of the Kudu C++ client library did not degrade due to the introduction of the AUTO_FLUSH_BACKGROUND functionality.

The table below contains results of performance measurements for the old and the new C++ client library (i.e. prior and after AUTO_FLUSH_BACKGROUND-related changes).

Table 1. The old and new library in MANUAL_FLUSH mode: 1 thread inserting 8M rows, async flush every 1000 rows
Source code Total buffer size (MiB) Batcher size (MiB) Max number of batchers CPU usage insert_loadgen CPU usage kudu-tserver Time total (ms) Time per row (ms)

OLD

unlimited

7

unlimited

129.5%

845.6%

11757.8

0.001469

NEW

128

n/a

unlimited

136.7%

588.6%

12057.3

0.001507

NEW

7

n/a

16

119.2%

282.2%

12098.1

0.001512

NEW

7

n/a

2

74.6%

152.2%

19034.6

0.002379

The slight (about 2.5%) performance drop of the new version is being addressed.

The new Kudu C++ library: MANUAL_FLUSH vs AUTO_FLUSH_BACKGROUND mode

The table below contain information on measured performance of the new code. The paired tests allow to compare performance of Kudu client sessions running in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.

In the table, the value of the 'Flush every N rows' column is 'n/a' for tests run in AUTO_FLUSH_BACKGROUND mode. The column’s values for the MANUAL_MODE are set to correspond to the buffer size limit and the flush watermark of the pairing test run in AUTO_FLUSH_BACKGROUND mode.

Table 2. The new C++ client library: MANUAL_FLUSH vs AUTO_FLUSH_BACKGROUND mode, 1 thread inserting 8M rows
Total buffer size (MiB) Max number of batchers Buffer flush watermark Flush every N rows CPU usage insert_loadgen CPU usage kudu-tserver Time total (ms) Time per row (ms)

7

2

n/a

31000

115.1%

117.4%

23749.3

0.002968

7

2

50%

n/a

116.1%

108.1%

24353.6

0.003044

7

4

n/a

15500

116.4%

114.0%

22736.6

0.002842

7

4

25%

n/a

116.7%

113.0%

23817.7

0.002977

7

4

n/a

1000

118.4%

296.7%

13203.1

0.001650

7

4

25%

n/a

131.3%

271.8%

11250.9

0.001406

The difference in performance can be explained the following way: with comparable parameters, a session in AUTO_FLUSH_BACKGROUND mode is a bit less performant compared with a session in MANUAL_FLUSH mode because more checks are executed and more synchronization is involved in the hot paths in the case of AUTO_FLUSH_BACKGROUND mode.

Apparently, a session in AUTO_FLUSH_BACKGROUND mode performs better than a session in MANUAL_FLUSH mode when its buffer is able to accomodate substantially more operations than a session in MANUAL_FLUSH mode flushes at once.

C++ vs Java client in AUTO_FLUSH_BACKGROUND mode

The table below contains information on measured performance of the Kudu C and Java client test applications running in AUTO_FLUSH_BACKGROUND mode. The Java client code is slightly modified version of the InsertLoadgen from kudu-examples. The minor modifications are to be a better match for the C test application and to sqeeze more performance from the code (the original InsertLoadgen example performs about 10 times slower in the described scenario).

The Kudu’s Java client API does not allow to specify limit on the buffer size as amount of memory used by serialized write operations. Instead, it uses the notion of the maximum number of write operations to buffer. By default, the limit is set to 1000 operations. Given the size of the serialized write operations used throughout the tests, set the size of the buffer for the C client accordingly. The idea is to have the same limit on the buffer size both for the C and the Java client to perform 'apples-to-apples' comparison. Besides the 'apples-to-apples' comparison, every table contains two more rows with additional measurements for the C++ test application:

  • twice increased limit on the maximum number of batchers and twice lower flush watermark

  • default set of buffering/batching parameters for the Kudu C++ client library

Table 3. C++ vs Java client in AUTO_FLUSH_BACKGROUND mode, 1 inserter thread, 8M rows per thread
Total buffer limit Max number of batchers Buffer flush watermark Type of client CPU usage insert_loadgen CPU usage kudu-tserver Time total (ms) Time per row (ms)

1000 ops

2

50

Java

80.5%

82.2%

34316.5

0.004290

117000 Bytes

2

50

C++

108.9%

116.2%

29055.6

0.003631

117000 Bytes

4

25

C++

115.7%

150.1%

27724.7

0.003393

7MiB

2

50

C++

114.1%

110.5%

23909.8

0.002988

Table 4. C++ vs Java client in AUTO_FLUSH_BACKGROUND mode, 2 inserter threads, 4M rows per thread
Total buffer limit Max number of batchers Buffer flush watermark Type of client CPU usage insert_loadgen CPU usage kudu-tserver Time total (ms) Time per row (ms)

1000 ops

2

50

Java

93.9%

196.8%

24687

0.003086

117000 Bytes

2

50

C++

192.2%

210.4%

21154.9

0.002644

117000 Bytes

4

25

C++

217.5%

312.7%

17524.5

0.002190

7MiB

2

50

C++

220.1%

210.5%

18405.5

0.002300

Table 5. C++ vs Java client in AUTO_FLUSH_BACKGROUND mode, 4 inserter threads, 2M rows per thread
Total buffer limit Max number of batchers Buffer flush watermark Type of client CPU usage insert_loadgen CPU usage kudu-tserver Time total (ms) Time per row (ms)

1000 ops

2

50

Java

149.4%

308.8%

15175.3

0.001897

117000 Bytes

2

50

C++

378.6%

353.2%

14360.3

0.001795

117000 Bytes

4

25

C++

422.6%

437.2%

13229.6

0.001653

7MiB

2

50

C++

398.7%

355.9%

12620.8

0.001577

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment