bobrik/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Redis KPTI benchmark

Other benchmarks make me sad. People run them in the cloud where VMs can
migrate between different hosts on reboots and use old kernels. Here we use
bare metal with the same recent kernel to remove extra variables
and measure KPTI impact and KPTI impact only.
Setup


Kernel: 4.14.11
CPU: 2 x Xeon E5-2630 v3 @ 2.40GHz (32 logical cores)

Some prerequisites:

Every measurement is taken with a freshly restarted Redis without persistence
There were no other services that could use any significant CPU
Reboots happened to ensure consistency, but only one boot is included

The following command was used to launch the server:
$ docker run --rm -it --net host redis:4.0.6 redis-server --save "" --appendonly no

With PTI

$ dmesg -T | fgrep 'page tables isolation'
[Sat Jan 13 04:28:34 2018] Kernel/User page tables isolation: enabled

Regular (1M GET + 1M SET)

$ for i in $(seq 1 3); do echo $(date) "try $i"; docker run --rm -it --net host redis:4.0.6 redis-benchmark -q -n 1000000 -t set,get -r 1000000; done
Sat Jan 13 04:49:25 UTC 2018 try 1
SET: 94948.73 requests per second
GET: 95648.02 requests per second

Sat Jan 13 04:49:46 UTC 2018 try 2
SET: 96627.70 requests per second
GET: 95593.15 requests per second

Sat Jan 13 04:50:07 UTC 2018 try 3
SET: 95410.74 requests per second
GET: 94446.54 requests per second


SET difference between low and high: (96627.70 - 94948.73) / 94948.73 * 100 = 1.77%
GET difference between low and high: (95648.02 - 94446.54) / 94446.54 * 100 = 1.27%

Bonus perf stats

$ sudo perf stat -d -p 3144
^C
 Performance counter stats for process id '3144':

      58904.467362      task-clock (msec)         #    0.804 CPUs utilized
           308,662      context-switches          #    0.005 M/sec
                 4      cpu-migrations            #    0.000 K/sec
            23,828      page-faults               #    0.405 K/sec
   183,214,645,662      cycles                    #    3.110 GHz                      (50.10%)
   141,173,051,301      instructions              #    0.77  insn per cycle           (62.62%)
    27,713,334,373      branches                  #  470.479 M/sec                    (62.59%)
       171,263,576      branch-misses             #    0.62% of all branches          (62.50%)
    42,720,689,753      L1-dcache-loads           #  725.254 M/sec                    (62.42%)
     3,632,866,438      L1-dcache-load-misses     #    8.50% of all L1-dcache hits    (24.95%)
     1,014,432,733      LLC-loads                 #   17.222 M/sec                    (25.05%)
       426,430,439      LLC-load-misses           #   42.04% of all LL-cache hits     (37.58%)

      73.309047198 seconds time elapsed

$ sudo perf stat -d -p 4398
^C
 Performance counter stats for process id '4398':

      58918.071006      task-clock (msec)         #    0.866 CPUs utilized
           475,459      context-switches          #    0.008 M/sec
                 5      cpu-migrations            #    0.000 K/sec
            23,780      page-faults               #    0.404 K/sec
   182,551,645,439      cycles                    #    3.098 GHz                      (50.13%)
   142,093,978,992      instructions              #    0.78  insn per cycle           (62.68%)
    27,936,369,361      branches                  #  474.156 M/sec                    (62.57%)
       173,701,977      branch-misses             #    0.62% of all branches          (62.44%)
    43,069,902,734      L1-dcache-loads           #  731.013 M/sec                    (62.43%)
     3,745,293,484      L1-dcache-load-misses     #    8.70% of all L1-dcache hits    (25.01%)
     1,077,645,288      LLC-loads                 #   18.291 M/sec                    (25.10%)
       432,856,886      LLC-load-misses           #   40.17% of all LL-cache hits     (37.57%)

      68.039181383 seconds time elapsed

Pipelined (10M GET + 10M SET)

$ for i in $(seq 1 3); do echo $(date) "try $i"; docker run --rm -it --net host redis:4.0.6 redis-benchmark -q -P 32 -n 10000000 -t set,get -r 1000000; done
Sat Jan 13 04:52:03 UTC 2018 try 1
SET: 863632.38 requests per second
GET: 1037344.38 requests per second

Sat Jan 13 04:52:24 UTC 2018 try 2
SET: 859254.19 requests per second
GET: 1027643.62 requests per second

Sat Jan 13 04:52:46 UTC 2018 try 3
SET: 862738.38 requests per second
GET: 1016156.94 requests per second


SET difference between low and high: (863632.38 - 859254.19) / 859254.19 * 100 = 0.51%
GET difference between low and high: (1037344.38 - 1016156.94) / 1016156.94 * 100 = 2.01%

Without PTI (nopti kernel cmdline)

$ dmesg -T | fgrep 'page tables isolation'
[Sat Jan 13 04:56:44 2018] Kernel/User page tables isolation: disabled on command line.

Regular (1M GET + 1M SET)

$ for i in $(seq 1 3); do echo $(date) "try $i"; docker run --rm -it --net host redis:4.0.6 redis-benchmark -q -n 1000000 -t set,get -r 1000000; done
Sat Jan 13 05:02:32 UTC 2018 try 1
SET: 109745.38 requests per second
GET: 110277.90 requests per second

Sat Jan 13 05:02:50 UTC 2018 try 2
SET: 113934.14 requests per second
GET: 108003.02 requests per second

Sat Jan 13 05:03:08 UTC 2018 try 3
SET: 112637.98 requests per second
GET: 112498.59 requests per second


SET difference between low and high: (113934.14 - 109745.38) / 109745.38 * 100 = 3.82%
GET difference between low and high: (112498.59 - 108003.02) / 108003.02 * 100 = 4.16%

Bonus perf stats

$ sudo perf stat -d -p 9498
^C
 Performance counter stats for process id '9498':

      51859.650102      task-clock (msec)         #    0.853 CPUs utilized
           171,547      context-switches          #    0.003 M/sec
                 4      cpu-migrations            #    0.000 K/sec
            23,724      page-faults               #    0.457 K/sec
   162,137,087,109      cycles                    #    3.126 GHz                      (49.95%)
   137,847,228,034      instructions              #    0.85  insn per cycle           (62.46%)
    26,968,561,543      branches                  #  520.030 M/sec                    (62.46%)
       167,141,761      branch-misses             #    0.62% of all branches          (62.50%)
    41,819,831,836      L1-dcache-loads           #  806.404 M/sec                    (62.50%)
     3,393,654,584      L1-dcache-load-misses     #    8.11% of all L1-dcache hits    (25.03%)
       953,380,618      LLC-loads                 #   18.384 M/sec                    (24.99%)
       430,826,721      LLC-load-misses           #   45.19% of all LL-cache hits     (37.49%)

      60.789997981 seconds time elapsed

$ sudo perf stat -d -p 11934
^C
 Performance counter stats for process id '11934':

      52241.822970      task-clock (msec)         #    0.745 CPUs utilized
           251,987      context-switches          #    0.005 M/sec
                 4      cpu-migrations            #    0.000 K/sec
            23,686      page-faults               #    0.453 K/sec
   162,948,839,545      cycles                    #    3.119 GHz                      (49.94%)
   138,960,314,319      instructions              #    0.85  insn per cycle           (62.48%)
    27,243,372,526      branches                  #  521.486 M/sec                    (62.48%)
       167,741,490      branch-misses             #    0.62% of all branches          (62.49%)
    42,188,015,450      L1-dcache-loads           #  807.553 M/sec                    (62.54%)
     3,429,701,506      L1-dcache-load-misses     #    8.13% of all L1-dcache hits    (25.02%)
       962,609,210      LLC-loads                 #   18.426 M/sec                    (24.99%)
       432,863,320      LLC-load-misses           #   44.97% of all LL-cache hits     (37.45%)

      70.121123867 seconds time elapsed

Pipelined (10M GET + 10M SET)

$ for i in $(seq 1 3); do echo $(date) "try $i"; docker run --rm -it --net host redis:4.0.6 redis-benchmark -q -P 32 -n 10000000 -t set,get -r 1000000; done
Sat Jan 13 05:04:10 UTC 2018 try 1
SET: 931532.38 requests per second
GET: 1127777.25 requests per second

Sat Jan 13 05:04:30 UTC 2018 try 2
SET: 914243.94 requests per second
GET: 1135718.25 requests per second

Sat Jan 13 05:04:49 UTC 2018 try 3
SET: 921489.12 requests per second
GET: 1127650.00 requests per second


SET difference between low and high: (931532.38 - 914243.94) / 914243.94 * 100 = 1.89%
GET difference between low and high: (1135718.25 - 1127650.00) / 1127650.00 * 100 = 0.72%

Comparison

Minimum throughput comparison across runs is presented below.
In each we calculate by how much no PTI is faster than PTI equivalent.
Regular


SET: (109745.38 - 94948.73) / 94948.73 * 100 = 15.58% (13.48% slower)
GET: (108003.02 - 94446.54) / 94446.54 * 100 = 14.35% (12.55% slower)

Pipelined


SET: (914243.94 - 859254.19) / 859254.19 * 100 = 6.40% (6.02% slower)
GET: (1127650.00 - 1016156.94) / 1016156.94 * 100 = 10.98% (9.90% slower)

Notes

Keep in mind that while these numbers look high, there were fluctuations
within the same setup between runs, so some amount of noise is present,
especially with non-pipelined version.
IPC in perf stat went down from 0.85 IPC to 0.74 IPC (12.95% down).
Conclusion

There is no conclusion, make yoour own.