Skip to content

Instantly share code, notes, and snippets.

@scriptnull
Last active February 10, 2020 12:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save scriptnull/ba6ac2d45215758e5139d3023b8a3b50 to your computer and use it in GitHub Desktop.
Save scriptnull/ba6ac2d45215758e5139d3023b8a3b50 to your computer and use it in GitHub Desktop.
Benchamarking TiKV

Before starting, I want to layout that "I think, I am doing something wrong" and very much appreciate suggestions and help to identify my mistakes.

Hardware

1 Google Cloud Platform VM instance of type n1-standard-8 (8 vCPUs + 30 GB RAM) attached with a local SSD.

I wanted to setup just a single node and do a basic benchmark, before going all in on setting up many nodes.

  • May be this is a mistake?
  • Should I be running multiple nodes?
  • Is 1 PD + 1 TiKV not enough for just using the Raw KV API?

Software

Setted up just 1 PD and 1 TiKV in docker containers.

sudo docker run -d --name pd1 \
  -p 2379:2379 \
  -p 2380:2380 \
  -v /etc/localtime:/etc/localtime:ro \
  -v /mnt/disks/localssd/data:/data \
  pingcap/pd:latest \
  --name="pd1" \
  --data-dir="/data/pd1" \
  --client-urls="http://0.0.0.0:2379" \
  --advertise-client-urls="http://benchmark-tikv-server:2379" \
  --peer-urls="http://0.0.0.0:2380" \
  --advertise-peer-urls="http://benchmark-tikv-server:2380" \
  --initial-cluster=""

sudo docker run -d --name tikv1 \
  -p 20160:20160 \
  -v /etc/localtime:/etc/localtime:ro \
  -v /mnt/disks/localssd/data:/data \
  pingcap/tikv:latest \
  --addr="0.0.0.0:20160" \
  --advertise-addr="10.132.15.225:20160" \
  --data-dir="/data/tikv1" \
  --pd="benchmark-tikv-server:2379" 

/data is mounted to a path on a local SSD in the host machine.

Benchmarking client

YCSB fork using the Java TiKV client (can be found at https://github.com/scriptnull/YCSB/tree/tikv ). It uses Raw KV API to get and set data.

Results

Insert workload

When trying to do 1 million inserts, the following YCSB results could be observed.

[OVERALL], RunTime(ms), 896967
[OVERALL], Throughput(ops/sec), 1114.8682170024092
[TOTAL_GCS_G1_Young_Generation], Count, 10
[TOTAL_GC_TIME_G1_Young_Generation], Time(ms), 293
[TOTAL_GC_TIME_%_G1_Young_Generation], Time(%), 0.03266563875817059
[TOTAL_GCS_G1_Old_Generation], Count, 0
[TOTAL_GC_TIME_G1_Old_Generation], Time(ms), 0
[TOTAL_GC_TIME_%_G1_Old_Generation], Time(%), 0.0
[TOTAL_GCs], Count, 10
[TOTAL_GC_TIME], Time(ms), 293
[TOTAL_GC_TIME_%], Time(%), 0.03266563875817059
[CLEANUP], Operations, 20
[CLEANUP], AverageLatency(us), 2.2
[CLEANUP], MinLatency(us), 1
[CLEANUP], MaxLatency(us), 7
[CLEANUP], 95thPercentileLatency(us), 2
[CLEANUP], 99thPercentileLatency(us), 7
[INSERT], Operations, 1000000
[INSERT], AverageLatency(us), 17890.24202
[INSERT], MinLatency(us), 4816
[INSERT], MaxLatency(us), 565247
[INSERT], 95thPercentileLatency(us), 24687
[INSERT], 99thPercentileLatency(us), 28735
[INSERT], Return=OK, 1000000

Read + Update workload

When trying to do 1 million runs + update (50% - 50%) the following results could be obtained.

[OVERALL], RunTime(ms), 466074
[OVERALL], Throughput(ops/sec), 2145.5820320378307
[TOTAL_GCS_G1_Young_Generation], Count, 10
[TOTAL_GC_TIME_G1_Young_Generation], Time(ms), 251
[TOTAL_GC_TIME_%_G1_Young_Generation], Time(%), 0.05385410900414955
[TOTAL_GCS_G1_Old_Generation], Count, 0
[TOTAL_GC_TIME_G1_Old_Generation], Time(ms), 0
[TOTAL_GC_TIME_%_G1_Old_Generation], Time(%), 0.0
[TOTAL_GCs], Count, 10
[TOTAL_GC_TIME], Time(ms), 251
[TOTAL_GC_TIME_%], Time(%), 0.05385410900414955
[READ], Operations, 499216
[READ], AverageLatency(us), 706.6268669273421
[READ], MinLatency(us), 376
[READ], MaxLatency(us), 305663
[READ], 95thPercentileLatency(us), 963
[READ], 99thPercentileLatency(us), 1145
[READ], Return=OK, 499216
[CLEANUP], Operations, 20
[CLEANUP], AverageLatency(us), 2.4
[CLEANUP], MinLatency(us), 1
[CLEANUP], MaxLatency(us), 7
[CLEANUP], 95thPercentileLatency(us), 6
[CLEANUP], 99thPercentileLatency(us), 7
[UPDATE], Operations, 500784
[UPDATE], AverageLatency(us), 17746.126349883383
[UPDATE], MinLatency(us), 4352
[UPDATE], MaxLatency(us), 295167
[UPDATE], 95thPercentileLatency(us), 23647
[UPDATE], 99thPercentileLatency(us), 27807
[UPDATE], Return=OK, 500784

System stats

Had a tmux session with htop + iostat -x 1 running during the benchmarking process.

image

  • Is it normal to have this much less CPU usage?
  • RAM usage is also less. Tried increasing the block cache size to 10GB, still low usage - only 1 GB of it is getting used.
  • %util in iostat -x 1 is way too low. It flickers between 0.0 to 0.40. For other key value stores I was able to achieve 90+%.
    • Will increasing this lead to better throughput?
    • How do we tune TiKV to increase this factor?
  • will decreasing w_await in iostat -x 1 cause more utilization and lead to better throughput? If so what tuning in TiKV can help with this?
@zhangjinpeng87
Copy link

Please confirm the block-cache-capacity set correctly, it should be [storage.block-cache] capacity = "10GB".
Please check the dataset size, if it is too small?
Please check if the bottleneck is the YCSB client machine.

@scriptnull
Copy link
Author

Thanks @zhangjinpeng1987 I was able to get to more RAM usage + CPU usage with the help of your comment.

Gone from 1114 ops per second to 26510 ops per second!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment