Skip to content

Instantly share code, notes, and snippets.

@marshallpierce
Last active January 13, 2017 18:41
Show Gist options
  • Save marshallpierce/20351d44b29e238fecb821dc87251e5f to your computer and use it in GitHub Desktop.
Save marshallpierce/20351d44b29e238fecb821dc87251e5f to your computer and use it in GitHub Desktop.
base64 benchmarks

Comparing https://github.com/marshallpierce/rust-base64/tree/perf-optimization to https://github.com/aklomp/base64.

All tests are on an i7-6850K.

f17906e 4 bytes at a time, read individually

test decode_100b            ... bench:         154 ns/iter (+/- 0) = 649 MB/s
test decode_100b_reuse_buf  ... bench:         128 ns/iter (+/- 0) = 781 MB/s
test decode_10mib           ... bench:  14,392,694 ns/iter (+/- 177,189) = 728 MB/s
test decode_10mib_reuse_buf ... bench:  12,921,534 ns/iter (+/- 708,616) = 811 MB/s
test decode_30mib           ... bench:  43,486,014 ns/iter (+/- 477,785) = 723 MB/s
test decode_30mib_reuse_buf ... bench:  38,836,887 ns/iter (+/- 42,624) = 809 MB/s
test decode_3b              ... bench:          37 ns/iter (+/- 0) = 108 MB/s
test decode_3b_reuse_buf    ... bench:          14 ns/iter (+/- 0) = 285 MB/s
test decode_3kib            ... bench:       3,708 ns/iter (+/- 5) = 828 MB/s
test decode_3kib_reuse_buf  ... bench:       3,665 ns/iter (+/- 8) = 838 MB/s
test decode_3mib            ... bench:   4,196,733 ns/iter (+/- 36,054) = 749 MB/s
test decode_3mib_reuse_buf  ... bench:   3,806,653 ns/iter (+/- 10,710) = 826 MB/s
test decode_500b            ... bench:         626 ns/iter (+/- 1) = 798 MB/s
test decode_500b_reuse_buf  ... bench:         608 ns/iter (+/- 1) = 822 MB/s
test decode_50b             ... bench:         160 ns/iter (+/- 68) = 325 MB/s
test decode_50b_reuse_buf   ... bench:          66 ns/iter (+/- 0) = 787 MB/s

4d6d81a naive aklomp-style loop accumulating bytes in a u64

test decode_100b            ... bench:         232 ns/iter (+/- 6) = 431 MB/s
test decode_100b_reuse_buf  ... bench:         205 ns/iter (+/- 3) = 487 MB/s
test decode_10mib           ... bench:  23,344,214 ns/iter (+/- 896,645) = 449 MB/s
test decode_10mib_reuse_buf ... bench:  21,912,356 ns/iter (+/- 11,511,975) = 478 MB/s
test decode_30mib           ... bench:  69,619,614 ns/iter (+/- 3,055,922) = 451 MB/s
test decode_30mib_reuse_buf ... bench:  65,532,939 ns/iter (+/- 309,728) = 480 MB/s
test decode_3b              ... bench:          37 ns/iter (+/- 29) = 108 MB/s
test decode_3b_reuse_buf    ... bench:          27 ns/iter (+/- 1) = 148 MB/s
test decode_3kib            ... bench:       6,578 ns/iter (+/- 327) = 467 MB/s
test decode_3kib_reuse_buf  ... bench:       6,041 ns/iter (+/- 67) = 508 MB/s
test decode_3mib            ... bench:   6,782,024 ns/iter (+/- 134,704) = 463 MB/s
test decode_3mib_reuse_buf  ... bench:   6,462,232 ns/iter (+/- 3,234,497) = 486 MB/s
test decode_500b            ... bench:       1,018 ns/iter (+/- 36) = 491 MB/s
test decode_500b_reuse_buf  ... bench:       1,040 ns/iter (+/- 15) = 480 MB/s
test decode_50b             ... bench:         134 ns/iter (+/- 1) = 388 MB/s
test decode_50b_reuse_buf   ... bench:         109 ns/iter (+/- 3) = 477 MB/s

8361c1e use byteorder to write instead of push()

test decode_100b            ... bench:         189 ns/iter (+/- 8) = 529 MB/s
test decode_100b_reuse_buf  ... bench:         167 ns/iter (+/- 2) = 598 MB/s
test decode_10mib           ... bench:  19,623,929 ns/iter (+/- 2,479,538) = 534 MB/s
test decode_10mib_reuse_buf ... bench:  17,921,543 ns/iter (+/- 1,501,785) = 585 MB/s
test decode_30mib           ... bench:  58,865,367 ns/iter (+/- 162,679) = 534 MB/s
test decode_30mib_reuse_buf ... bench:  53,721,337 ns/iter (+/- 95,730) = 585 MB/s
test decode_3b              ... bench:          37 ns/iter (+/- 0) = 108 MB/s
test decode_3b_reuse_buf    ... bench:          15 ns/iter (+/- 0) = 266 MB/s
test decode_3kib            ... bench:       5,140 ns/iter (+/- 11) = 597 MB/s
test decode_3kib_reuse_buf  ... bench:       5,120 ns/iter (+/- 8) = 600 MB/s
test decode_3mib            ... bench:   5,708,706 ns/iter (+/- 53,053) = 551 MB/s
test decode_3mib_reuse_buf  ... bench:   5,289,426 ns/iter (+/- 12,308) = 594 MB/s
test decode_500b            ... bench:         861 ns/iter (+/- 4) = 580 MB/s
test decode_500b_reuse_buf  ... bench:         841 ns/iter (+/- 1) = 594 MB/s
test decode_50b             ... bench:         114 ns/iter (+/- 0) = 456 MB/s
test decode_50b_reuse_buf   ... bench:          93 ns/iter (+/- 0) = 559 MB/s

f3c8891 write via mutable slice

test decode_100b            ... bench:         113 ns/iter (+/- 4) = 884 MB/s
test decode_100b_reuse_buf  ... bench:          96 ns/iter (+/- 9) = 1041 MB/s
test decode_10mib           ... bench:  10,833,452 ns/iter (+/- 248,055) = 967 MB/s
test decode_10mib_reuse_buf ... bench:   9,396,666 ns/iter (+/- 107,273) = 1115 MB/s
test decode_30mib           ... bench:  32,653,938 ns/iter (+/- 351,150) = 963 MB/s
test decode_30mib_reuse_buf ... bench:  28,486,029 ns/iter (+/- 133,783) = 1104 MB/s
test decode_3b              ... bench:          41 ns/iter (+/- 10) = 97 MB/s
test decode_3b_reuse_buf    ... bench:          19 ns/iter (+/- 0) = 210 MB/s
test decode_3kib            ... bench:       2,623 ns/iter (+/- 11) = 1171 MB/s
test decode_3kib_reuse_buf  ... bench:       2,589 ns/iter (+/- 15) = 1186 MB/s
test decode_3mib            ... bench:   3,153,039 ns/iter (+/- 80,266) = 997 MB/s
test decode_3mib_reuse_buf  ... bench:   2,725,100 ns/iter (+/- 14,198) = 1154 MB/s
test decode_500b            ... bench:         447 ns/iter (+/- 1) = 1118 MB/s
test decode_500b_reuse_buf  ... bench:         432 ns/iter (+/- 1) = 1157 MB/s
test decode_50b             ... bench:          78 ns/iter (+/- 0) = 666 MB/s
test decode_50b_reuse_buf   ... bench:          58 ns/iter (+/- 0) = 896 MB/s

87d62a9 read chunk via read_u64

test decode_100b            ... bench:         106 ns/iter (+/- 1) = 943 MB/s
test decode_100b_reuse_buf  ... bench:          86 ns/iter (+/- 1) = 1162 MB/s
test decode_10mib           ... bench:   9,420,025 ns/iter (+/- 505,041) = 1113 MB/s
test decode_10mib_reuse_buf ... bench:   8,051,450 ns/iter (+/- 66,643) = 1302 MB/s
test decode_30mib           ... bench:  28,706,956 ns/iter (+/- 101,182) = 1095 MB/s
test decode_30mib_reuse_buf ... bench:  24,469,885 ns/iter (+/- 110,068) = 1285 MB/s
test decode_3b              ... bench:          41 ns/iter (+/- 0) = 97 MB/s
test decode_3b_reuse_buf    ... bench:          18 ns/iter (+/- 1) = 222 MB/s
test decode_3kib            ... bench:       2,300 ns/iter (+/- 11) = 1335 MB/s
test decode_3kib_reuse_buf  ... bench:       2,183 ns/iter (+/- 10) = 1407 MB/s
test decode_3mib            ... bench:   2,738,919 ns/iter (+/- 21,591) = 1148 MB/s
test decode_3mib_reuse_buf  ... bench:   2,321,102 ns/iter (+/- 8,093) = 1355 MB/s
test decode_500b            ... bench:         385 ns/iter (+/- 0) = 1298 MB/s
test decode_500b_reuse_buf  ... bench:         368 ns/iter (+/- 1) = 1358 MB/s
test decode_50b             ... bench:          71 ns/iter (+/- 0) = 732 MB/s
test decode_50b_reuse_buf   ... bench:          50 ns/iter (+/- 0) = 1040 MB/s

3c9bc2b Move error return outside of loop for another 10%

test decode_100b            ... bench:         105 ns/iter (+/- 0) = 952 MB/s
test decode_100b_reuse_buf  ... bench:          83 ns/iter (+/- 0) = 1204 MB/s
test decode_10mib           ... bench:   9,114,564 ns/iter (+/- 91,280) = 1150 MB/s
test decode_10mib_reuse_buf ... bench:   7,723,483 ns/iter (+/- 64,869) = 1357 MB/s
test decode_30mib           ... bench:  27,708,742 ns/iter (+/- 183,745) = 1135 MB/s
test decode_30mib_reuse_buf ... bench:  23,482,819 ns/iter (+/- 35,089) = 1339 MB/s
test decode_3b              ... bench:          41 ns/iter (+/- 0) = 97 MB/s
test decode_3b_reuse_buf    ... bench:          18 ns/iter (+/- 1) = 222 MB/s
test decode_3kib            ... bench:       2,108 ns/iter (+/- 4) = 1457 MB/s
test decode_3kib_reuse_buf  ... bench:       2,146 ns/iter (+/- 12) = 1431 MB/s
test decode_3mib            ... bench:   2,641,237 ns/iter (+/- 24,619) = 1191 MB/s
test decode_3mib_reuse_buf  ... bench:   2,225,719 ns/iter (+/- 13,955) = 1413 MB/s
test decode_500b            ... bench:         372 ns/iter (+/- 1) = 1344 MB/s
test decode_500b_reuse_buf  ... bench:         353 ns/iter (+/- 0) = 1416 MB/s
test decode_50b             ... bench:          68 ns/iter (+/- 1) = 764 MB/s
test decode_50b_reuse_buf   ... bench:          49 ns/iter (+/- 0) = 1061 MB/s

dfb2864 Calculate error byte at error time rather than writing to a local that's read later.

test decode_100b            ... bench:          96 ns/iter (+/- 5) = 1041 MB/s
test decode_100b_reuse_buf  ... bench:          75 ns/iter (+/- 1) = 1333 MB/s
test decode_10mib           ... bench:   8,754,671 ns/iter (+/- 274,348) = 1197 MB/s
test decode_10mib_reuse_buf ... bench:   7,334,820 ns/iter (+/- 129,159) = 1429 MB/s
test decode_30mib           ... bench:  26,617,411 ns/iter (+/- 622,087) = 1181 MB/s
test decode_30mib_reuse_buf ... bench:  22,387,310 ns/iter (+/- 182,809) = 1405 MB/s
test decode_3b              ... bench:          42 ns/iter (+/- 2) = 95 MB/s
test decode_3b_reuse_buf    ... bench:          19 ns/iter (+/- 1) = 210 MB/s
test decode_3kib            ... bench:       2,004 ns/iter (+/- 37) = 1532 MB/s
test decode_3kib_reuse_buf  ... bench:       1,976 ns/iter (+/- 40) = 1554 MB/s
test decode_3mib            ... bench:   2,536,762 ns/iter (+/- 217,774) = 1240 MB/s
test decode_3mib_reuse_buf  ... bench:   2,114,590 ns/iter (+/- 23,871) = 1487 MB/s
test decode_500b            ... bench:         352 ns/iter (+/- 10) = 1420 MB/s
test decode_500b_reuse_buf  ... bench:         341 ns/iter (+/- 11) = 1466 MB/s
test decode_50b             ... bench:          67 ns/iter (+/- 0) = 776 MB/s
test decode_50b_reuse_buf   ... bench:          46 ns/iter (+/- 0) = 1130 MB/s

aklomp base64 in C (we only care about plain for now) with gcc 4.9.4:

% make -C test benchmark && ./test/benchmark | grep -E '(buffer|plain)'
Filling buffer with 10.0 MB of random data...
Testing with buffer size 10 MB, fastest of 10 * 1
plain   encode  1870.24 MB/sec
plain   decode  1788.04 MB/sec
Testing with buffer size 1 MB, fastest of 10 * 10
plain   encode  1883.38 MB/sec
plain   decode  1801.25 MB/sec
Testing with buffer size 100 KB, fastest of 10 * 100
plain   encode  1884.75 MB/sec
plain   decode  1799.54 MB/sec
Testing with buffer size 10 KB, fastest of 100 * 100
plain   encode  1880.95 MB/sec
plain   decode  1799.46 MB/sec
Testing with buffer size 1 KB, fastest of 100 * 1000
plain   encode  1762.49 MB/sec
plain   decode  1751.79 MB/sec

C compiled with clang 3.7.1:

Filling buffer with 10.0 MB of random data...
Testing with buffer size 10 MB, fastest of 10 * 1
plain   encode  1594.81 MB/sec
plain   decode  1595.16 MB/sec
Testing with buffer size 1 MB, fastest of 10 * 10
plain   encode  1601.13 MB/sec
plain   decode  1603.28 MB/sec
Testing with buffer size 100 KB, fastest of 10 * 100
plain   encode  1600.60 MB/sec
plain   decode  1602.10 MB/sec
Testing with buffer size 10 KB, fastest of 100 * 100
plain   encode  1603.04 MB/sec
plain   decode  1602.35 MB/sec
Testing with buffer size 1 KB, fastest of 100 * 1000
plain   encode  1512.35 MB/sec
plain   decode  1565.88 MB/sec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment