alexeyserbin/afb.adoc Secret

## afb.adoc

      
    Raw
  

              afb.adoc
            
          
    Kudu C++ client library: performance comparision for AUTO_FLUSH_BACKGROUND mode


Summary


This note contains information on performance evaluation of
the Kudu C++ client library.  The result measurements show how the new version
of the library performs in simplistic write-only bulk workloads.
The performance measurements include results for the previos version of the
library and the Kudu Java client library as well.


Introduction


This performance evaluation uses a very simple 'push-as-much-data-as-possible'
scenario: the client generates and sends data to the server as fast as it can.
The client and the server run at the same machine and communicate
via the loopback network interface.  The idea is to quickly pinpoint issues
(if any) in the new code for the use case of the maximum possible throughput
to a single tablet server.  The simpliest max-throughput scenario allows
to focus on stress testing of the write data path and measure the maximum
achievable data throughput.


The Kudu server-side uses simpliest configuration: single master and
single tablet server.


The old and the new versions of the Kudu client C library have been compiled
in release configuration.  The C test application is compiled in
release configuration and linked with the Kudu client C++ library.
The server-side components have been compiled in release configuration as well.


For every iteration, a table with the following colums is created prior
to running the test:


"key" of type INT32, primary key


"int_val" of type INT32, not null


"string_val" of type STRING, not null


"long_string_val" of type STRING


"non_null_with_default": of type INT32, not null, default 12345


The test application fills in all integer columns with sequenced numbers
(no duplicate values).  The columns of the string type are populated
with pre-set string of 32 bytes.  The raw (wire) size of the resulting
write operation is between 115 and 118 bytes.


The code of the C++ client application used for performance comparison
can be found at https://gerrit.cloudera.org/#/c/4412/


All builds and measurements were performed at ve0518.halxg.cloudera.com.
The machine was not busy with any other significant load
(compilation, other tests, puppet activity, etc.) while running tests.
At least 3 runs (usually 5) were performed for every set of configuration
parameters.  The final result value in a 'Time total'/'Time per row' cell is
the average accross corresponding runs.


Comparision between the old and the new code for MANUAL_FLUSH mode


This measurement is to ensure that the performance of the Kudu C++ client
library did not degrade due to the introduction of the AUTO_FLUSH_BACKGROUND
functionality.


The table below contains results of performance measurements for
the old and the new C++ client library (i.e. prior and after
AUTO_FLUSH_BACKGROUND-related changes).


Table 1. The old and new library in MANUAL_FLUSH mode: 1 thread inserting 8M rows, async flush every 1000 rows


Source code
Total buffer size (MiB)
Batcher size (MiB)
Max number of batchers
CPU usage insert_loadgen
CPU usage kudu-tserver
Time total (ms)
Time per row (ms)


OLD
unlimited
7
unlimited
129.5%
845.6%
11757.8
0.001469


NEW
128
n/a
unlimited
136.7%
588.6%
12057.3
0.001507


NEW
7
n/a
16
119.2%
282.2%
12098.1
0.001512


NEW
7
n/a
2
74.6%
152.2%
19034.6
0.002379


The slight (about 2.5%) performance drop of the new version is being addressed.


The new Kudu C++ library: MANUAL_FLUSH vs AUTO_FLUSH_BACKGROUND mode


The table below contain information on measured performance of the new code.
The paired tests allow to compare performance of Kudu client sessions running
in MANUAL_FLUSH and AUTO_FLUSH_BACKGROUND modes.


In the table, the value of the 'Flush every N rows' column is 'n/a'
for tests run in AUTO_FLUSH_BACKGROUND mode.  The column’s values
for the MANUAL_MODE are set to correspond to the buffer size limit
and the flush watermark of the pairing test run in AUTO_FLUSH_BACKGROUND mode.


Table 2. The new C++ client library: MANUAL_FLUSH vs AUTO_FLUSH_BACKGROUND mode, 1 thread inserting 8M rows


Total buffer size (MiB)
Max number of batchers
Buffer flush watermark
Flush every N rows
CPU usage insert_loadgen
CPU usage kudu-tserver
Time total (ms)
Time per row (ms)


7
2
n/a
31000
115.1%
117.4%
23749.3
0.002968


7
2
50%
n/a
116.1%
108.1%
24353.6
0.003044


7
4
n/a
15500
116.4%
114.0%
22736.6
0.002842


7
4
25%
n/a
116.7%
113.0%
23817.7
0.002977


7
4
n/a
1000
118.4%
296.7%
13203.1
0.001650


7
4
25%
n/a
131.3%
271.8%
11250.9
0.001406


The difference in performance can be explained the following way:
with comparable parameters, a session in AUTO_FLUSH_BACKGROUND mode is a bit
less performant compared with a session in MANUAL_FLUSH mode because more checks
are executed and more synchronization is involved in the hot paths
in the case of AUTO_FLUSH_BACKGROUND mode.


Apparently, a session in AUTO_FLUSH_BACKGROUND mode performs better than
a session in MANUAL_FLUSH mode when its buffer is able to accomodate
substantially more operations than a session in MANUAL_FLUSH mode flushes
at once.


C++ vs Java client in AUTO_FLUSH_BACKGROUND mode


The table below contains information on measured performance of the
Kudu C and Java client test applications running in AUTO_FLUSH_BACKGROUND
mode.  The Java client code is slightly modified
version of the InsertLoadgen from kudu-examples.  The minor modifications
are to be a better match for the C test application and to sqeeze more
performance from the code (the original InsertLoadgen example performs about
10 times slower in the described scenario).


The Kudu’s Java client API does not allow to specify limit on the buffer size
as amount of memory used by serialized write operations.  Instead, it uses
the notion of the maximum number of write operations to buffer.  By default,
the limit is set to 1000 operations.  Given the size of the serialized write
operations used throughout the tests, set the size of the buffer
for the C client accordingly.  The idea is to have the same
limit on the buffer size both for the C and the Java client to perform
'apples-to-apples' comparison.  Besides the 'apples-to-apples' comparison,
every table contains two more rows with additional measurements for
the C++ test application:


twice increased limit on the maximum number of batchers and
twice lower flush watermark


default set of buffering/batching parameters for the Kudu C++ client library


Table 3. C++ vs Java client in AUTO_FLUSH_BACKGROUND mode, 1 inserter thread, 8M rows per thread


Total buffer limit
Max number of batchers
Buffer flush watermark
Type of client
CPU usage insert_loadgen
CPU usage kudu-tserver
Time total (ms)
Time per row (ms)


1000 ops
2
50
Java
80.5%
82.2%
34316.5
0.004290


117000 Bytes
2
50
C++
108.9%
116.2%
29055.6
0.003631


117000 Bytes
4
25
C++
115.7%
150.1%
27724.7
0.003393


7MiB
2
50
C++
114.1%
110.5%
23909.8
0.002988


Table 4. C++ vs Java client in AUTO_FLUSH_BACKGROUND mode, 2 inserter threads, 4M rows per thread


Total buffer limit
Max number of batchers
Buffer flush watermark
Type of client
CPU usage insert_loadgen
CPU usage kudu-tserver
Time total (ms)
Time per row (ms)


1000 ops
2
50
Java
93.9%
196.8%
24687
0.003086


117000 Bytes
2
50
C++
192.2%
210.4%
21154.9
0.002644


117000 Bytes
4
25
C++
217.5%
312.7%
17524.5
0.002190


7MiB
2
50
C++
220.1%
210.5%
18405.5
0.002300


Table 5. C++ vs Java client in AUTO_FLUSH_BACKGROUND mode, 4 inserter threads, 2M rows per thread


Total buffer limit
Max number of batchers
Buffer flush watermark
Type of client
CPU usage insert_loadgen
CPU usage kudu-tserver
Time total (ms)
Time per row (ms)


1000 ops
2
50
Java
149.4%
308.8%
15175.3
0.001897


117000 Bytes
2
50
C++
378.6%
353.2%
14360.3
0.001795


117000 Bytes
4
25
C++
422.6%
437.2%
13229.6
0.001653


7MiB
2
50
C++
398.7%
355.9%
12620.8
0.001577
Source code	Total buffer size (MiB)	Batcher size (MiB)	Max number of batchers	CPU usage insert_loadgen	CPU usage kudu-tserver	Time total (ms)	Time per row (ms)
OLD	unlimited	7	unlimited	129.5%	845.6%	11757.8	0.001469
NEW	128	n/a	unlimited	136.7%	588.6%	12057.3	0.001507
NEW	7	n/a	16	119.2%	282.2%	12098.1	0.001512
NEW	7	n/a	2	74.6%	152.2%	19034.6	0.002379
Total buffer size (MiB)	Max number of batchers	Buffer flush watermark	Flush every N rows	CPU usage insert_loadgen	CPU usage kudu-tserver	Time total (ms)	Time per row (ms)
7	2	n/a	31000	115.1%	117.4%	23749.3	0.002968
7	2	50%	n/a	116.1%	108.1%	24353.6	0.003044
7	4	n/a	15500	116.4%	114.0%	22736.6	0.002842
7	4	25%	n/a	116.7%	113.0%	23817.7	0.002977
7	4	n/a	1000	118.4%	296.7%	13203.1	0.001650
7	4	25%	n/a	131.3%	271.8%	11250.9	0.001406
Total buffer limit	Max number of batchers	Buffer flush watermark	Type of client	CPU usage insert_loadgen	CPU usage kudu-tserver	Time total (ms)	Time per row (ms)
1000 ops	2	50	Java	80.5%	82.2%	34316.5	0.004290
117000 Bytes	2	50	C++	108.9%	116.2%	29055.6	0.003631
117000 Bytes	4	25	C++	115.7%	150.1%	27724.7	0.003393
7MiB	2	50	C++	114.1%	110.5%	23909.8	0.002988
Total buffer limit	Max number of batchers	Buffer flush watermark	Type of client	CPU usage insert_loadgen	CPU usage kudu-tserver	Time total (ms)	Time per row (ms)
1000 ops	2	50	Java	93.9%	196.8%	24687	0.003086
117000 Bytes	2	50	C++	192.2%	210.4%	21154.9	0.002644
117000 Bytes	4	25	C++	217.5%	312.7%	17524.5	0.002190
7MiB	2	50	C++	220.1%	210.5%	18405.5	0.002300
Total buffer limit	Max number of batchers	Buffer flush watermark	Type of client	CPU usage insert_loadgen	CPU usage kudu-tserver	Time total (ms)	Time per row (ms)
1000 ops	2	50	Java	149.4%	308.8%	15175.3	0.001897
117000 Bytes	2	50	C++	378.6%	353.2%	14360.3	0.001795
117000 Bytes	4	25	C++	422.6%	437.2%	13229.6	0.001653
7MiB	2	50	C++	398.7%	355.9%	12620.8	0.001577