-
-
Save neuschaefer/e752af1bbdd057704e02e282e3e082c5 to your computer and use it in GitHub Desktop.
gsoc foo
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Subject: My GSoC contributions to WireGuard | |
Hi, | |
over the past few months, I was a GSoC[1] student in the WireGuard project | |
(nominally the Linux Foundation). Together with Thomas Gschwantner and Gauvain | |
Roussel-Tarbouriech (who were also GSoC students), and of course Jason | |
Donenfeld, I tried to implement some of the performance optimizations from the | |
TODO list[2]. | |
TL;DR: Here are the commits that I contributed during that time, which were | |
also merged (not all of them were): | |
- https://git.zx2c4.com/WireGuard/commit/?h=b289d122d8fe935de7138f77a8495dca76555b0e | |
- https://git.zx2c4.com/WireGuard/commit/?h=590c41089fd77bf43efda375c0e6bd3913cb9350 | |
- https://git.zx2c4.com/WireGuard/commit/?h=0f8718b4fcbb89abc8b8b25c09510e1f7fac9625 | |
- https://git.zx2c4.com/WireGuard/commit/?h=18822b84f88422f11611c598a0c23c56652693d9 | |
- https://git.zx2c4.com/WireGuard/commit/?h=6008eacbf2c7a5f31b0c9d5d0a629cbdfbb8f222 | |
Our first task was to change the trie (prefix tree) used in allowedips.c[3] | |
to look up peers based on their IP(v6) address, to use native endianness | |
(little endian on the x86 systems prevalent today) instead of big endian, | |
which is how the bytes in a IP(v6) address are stored in a packet. This | |
saves a few byte swaps per lookup in situations with many peers, because the | |
number is reduced from one byte swap per edge of trie that was walked to one | |
byte swap per complete traversal through the trie. | |
We implemented and discussed a few versions of this optimization, and | |
finally Jason committed his[4], and a selftest to help ensure that the new | |
implementation is correct[5]. I noticed that the debugging code that can | |
print the allowedips trie in Graphviz[6] format was now wrong, and fixed[7] | |
it. | |
The next goal was to replace the spinlock-based ring buffer implementation[8] | |
used in WireGuard to implement the receive/transmit and encrypt/decrypt | |
queues with a lock-free version. For this, we looked at ConcurrencyKit[9], | |
and reimplemented their algorithms. However, after facing no clear | |
performance improvement, and in some cases, serious performance degradation, | |
we gave up on this goal. The code, along with a selftest[10] can be found on | |
the jn/mpmc-wip branch[11]. | |
Next, I made use[12] of the NAPI[13] infrastructure of the Linux networking | |
subsystem, to improve the performance of the receive path, with some | |
success[14], but also a performance regression on some systems[15], which | |
Jason appears to have solved by using GRO[16] before passing packets up | |
through the network stack[17]. | |
Since my integration of NAPI relied one NAPI context per WireGuard peer, and | |
NAPI contexts are stored in a fixed-size hashtable in the kernel[18], Thomas | |
disabled NAPI busy polling, thus disabling the use of this hashtable[19]. | |
This avoids overloading the hashtable when there are many WireGuard peers. | |
My last task was to change WireGuard's hashtables[20] from a fixed-size | |
hashtable to the resizable hashtable implementation (called "rhashtable") | |
that's already in the upstream kernel[21]. This will save some memory when | |
few items are stored, and avoid overloading, and thus slowing down, the | |
hashtables when many items are stored. The main difficulty here stems from | |
the fact that WireGuard's pubkey_hashtable uses the SipHash function to be | |
better protected against hash collision, or "hashtable poisoning" | |
attacks[22], in which the hashed data is chosen in such a way that many | |
items are stored in the same hash bucket, slowing down lookups. SipHash uses | |
a 128-bit random seed to make the hashes less predictable, while the | |
existing rhashtable API only allows 32 bits. Thus, to properly integrate | |
SipHash into rhashtable, I need to extend the rhashtable API to use a longer | |
random seed. A work-in-progress version of this change can be found in the | |
linux-dev repo on git.zx2c4.com[22]. | |
Conclusion: | |
I have not reached all the goals that I initially set out to reach. In | |
particular, here are some of the items that were part of my GSoC project | |
proposal: | |
- Reducing bufferbloat with CoDel/DQL. Since WireGuard temporarily stores | |
packets in queues, and queues can add latency when they're full, it would be | |
interesting to try to limit the added latency with CoDel[23] or Linux's | |
dynamic queue limits library (DQL)[24]. | |
- CPU auto-scaling and NUMA awareness. Currently, WireGuard splits the | |
encryption/decryption workload among all CPUs[25] currently online. For low | |
packet rates it might be a good idea to stick to one CPU, to avoid polluting | |
the other CPUs' caches, and possibly to avoid the other CPUs from sleep | |
states. In larger systems with two or more set of CPUs, each with their own | |
L3 cache and DDR memory controller, i.e. in NUMA[26] systems, it might also | |
be a good idea to confine WireGuard to one NUMA node, to save bus traffic. | |
- Performance profiling and benchmarking. I included performance profiling and | |
benchmarking in my project proposal, because serious performance optimization | |
has to rely on benchmarks to find out which approach is *really* the fastest, | |
and on profiling to figure out where the bottlenecks are. I did, however, do | |
very little in this direction, beyond running iperf3. | |
Even though I did not do everything I intended to do, it was nice to bring | |
WireGuard a bit further and learn something about optimization and the Linux | |
networking stack. Thanks to Jason Donenfeld, the Linux Foundation, and Google | |
for providing this opportunity. | |
Jonathan Neuschäfer | |
[1]: https://summerofcode.withgoogle.com/ | |
[2]: https://www.wireguard.com/todo/ | |
[3]: https://git.zx2c4.com/WireGuard/tree/src/allowedips.c?h=0.0.20180420 | |
[4]: https://git.zx2c4.com/WireGuard/commit/?h=5e3532eb1895ccdd0a53d14b47d7fc85234f7866 | |
[5]: https://git.zx2c4.com/WireGuard/commit/?h=856f10593d2fc8b87992d9c5c556ddf108d313e8 | |
[6]: http://graphviz.org/ | |
[7]: https://git.zx2c4.com/WireGuard/commit/?h=b289d122d8fe935de7138f77a8495dca76555b0e | |
[8]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/ptr_ring.h?h=v4.18-rc8 | |
[9]: https://github.com/concurrencykit/ck/blob/0.6.0/include/ck_ring.h | |
[10]: https://git.zx2c4.com/WireGuard/tree/src/selftest/mpmc_ring.h?h=jn/mpmc-wip | |
[11]: https://git.zx2c4.com/WireGuard/log/?h=jn/mpmc-wip | |
[12]: https://git.zx2c4.com/WireGuard/commit/?h=6008eacbf2c7a5f31b0c9d5d0a629cbdfbb8f222 | |
[13]: https://en.wikipedia.org/wiki/New_API | |
[14]: https://lists.zx2c4.com/pipermail/wireguard/2018-July/003115.html | |
[15]: https://lists.zx2c4.com/pipermail/wireguard/2018-July/003124.html | |
[16]: https://lwn.net/Articles/358910/ | |
[17]: https://git.zx2c4.com/WireGuard/commit/?h=95951af7249912a4356b9a03cf3addc7e3f8f724 | |
[18]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?h=v4.18-rc8#n193 | |
[19]: https://git.zx2c4.com/WireGuard/commit/?h=fe5f0f661797b41648eac64b40e5038b25175047 | |
[20]: https://git.zx2c4.com/WireGuard/tree/src/hashtables.h?h=0.0.20180802 | |
[21]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/rhashtable.h?h=v4.18-rc8 | |
[22]: https://git.zx2c4.com/linux-dev/log/?h=jn/rhashtable-next&id=6f450b7f8f722dec835fe96ff5b99be98cd5622e | |
[23]: https://en.wikipedia.org/wiki/CoDel | |
[24]: https://lwn.net/Articles/454390/ | |
[25]: https://git.zx2c4.com/WireGuard/tree/src/queueing.h?h=0.0.20180802#n119 | |
[26]: https://en.wikipedia.org/wiki/Non-uniform_memory_access |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment