Skip to content

Instantly share code, notes, and snippets.

@smaeul
Created August 28, 2017 15:21
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save smaeul/975a11083db0294ca93c2ec6a8df4010 to your computer and use it in GitHub Desktop.
Save smaeul/975a11083db0294ca93c2ec6a8df4010 to your computer and use it in GitHub Desktop.
Google Summer of Code 2017 Report

Overview

The project I worked on this summer was divided into two major parts. First, I optimized the cryptography parallelism of the in-kernel WireGuard implementation. My goal was to improve overall speed, increase multi-core scalability, and fix some hard-to-reproduce race conditions. I spent the second half of the summer writing an Android application that serves as a convenient GUI frontend for WireGuard. My vision for the app was to be simple, yet very functional.

Queueing improvements

Get the code!

The work I did for this part of the project is available in the sh/queues-5-for-jason branch of the official repository.

Motivation

We were unsatisfied with the current padata-based implementation, due to some of its limitations, like a hard maximum on the number of outstanding work items, conflicts with softirq processing during work functions, reliance on a timer to work around race conditions, and a still-mysterious race condition that caused list/memory corruption and had been affecting one of my machines.

Overview

The design uses a pair of queues (really, sets of queues) to enable parallel encryption of packet data, while ensuring that packets are always kept in order. One set of queues is per peer. Packets are added to this queue as soon as they are received from userspace, and they are kept in this queue until they are passed to the underlying physical network interface.

Once added to the per-peer queue, packets are added round-robin to a per-CPU queue to be encrypted. These per-CPU queues are shared among all peers, since the CPU-intensive encryption process can work on packets in any order. Having shared queues for the "slow" part also prevents once peer from starving any others out of CPU time.

Once encryption is finished, the packet is marked with a flag (CTX_FINISHED) allowing it to be transmitted. The per-peer transmission function takes all marked packets from the front of the queue, dequeues them, and sends them out together. An equivalent process happens on the receiving side for decryption and forwarding packets to userspace.

Because both queues are single-reader, multiple-writer, I was able to implement them with simple single-cmpxchg loops (found in src/queue.h) instead of full spinlocks. This helps maximize the multi-core scalability of the data pipeline. Additionally, due to better integration with other parts of the WireGuard codebase, the three-stage send pipeline is reduced to two stages in the vast majority of cases; a separate step associating packets with a keypair only needs to happen when waiting for a handshake to complete.

Since I was able to remove the dependency on padata, my total contribution is about 850 lines of net reduction in code size. Excluding changes to the compatibility layer, however, I traded a 1000 line dependency for under a hundred lines in WireGuard itself, while at the same time improving performance.

src/config.c  |   2 +-
src/data.c    | 432 ++++++++++++++++++++++++++++++----------------------------
src/device.c  |  46 +++----
src/device.h  |  12 +-
src/main.c    |  12 +-
src/packets.h |  21 +--
src/peer.c    |  28 +++-
src/peer.h    |   9 +-
src/queue.h   | 139 +++++++++++++++++++
src/receive.c |   5 +-
src/send.c    |  96 ++-----------
src/timers.c  |   4 +-
12 files changed, 445 insertions(+), 361 deletions(-)

Current status

Due to a busy schedule at the end of the summer, Jason, the WireGuard maintainer, did not have an opportunity to fully review and merge my changes before the end of GSoC. However, the changes should be merged soon.

Next steps

  • Integration of the BQL (byte queue limits) queue management system. This is a library provided to help prevent queues from becoming too large, and dynamically resizing them based on observed latency. This would track the total number of bytes held by WireGuard in its queues on all cores. It also allows integration with fq_codel to ensure bufferbloat does not become a problem.
    • We already started work on this, but wanted to wait until the main changes were merged before putting too much time into it.
  • Removal of data.c. All of its functions can and should be moved to send.c or receive.c, now that infrastructure for working with padata is removed.
    • Again, I have already done this, but it prevents rebasing, so it is best left until the main changes are merged.
  • Some form of GRO (Generic receive offload). Right now, groups of packets up to 64KiB long are encrypted together when sent in one send(2) system call from userspace. This minimizes queue lengths and setup/teardown costs of the cryptography functions. However, packets are received one by one, so the receive side has more overhead and much longer queue lengths. GRO lets us group packets for the same peer as they are received.
    • Unfortunately, the normal GRO path does not preserve the original packet length, which WireGuard needs in order to find the end of the encrypted data. Thus using it for WireGuard involves "stealing" packets from the networking subsystem and keeping our own list per keypair.
  • Evaluate other methods of dividing encryption/decryption work among CPUs, besides round robin.

Educational highlights

  • All queues must have size limits, especially when the readers and writers compete for CPU time, because unbounded growth can easily consume all available RAM and bring down the system. I spent a while investigating an out-of-memory condition I thought was due to RCU grace periods that ended up just being due to the accidental removal of a queue length limit.
  • Multithreaded programming in kernelspace is surprisingly easy, considering the three levels of preemption and the rules around them. Lockups are also quite easy, and the built-in kernel debugging tools are integral to finding and fixing them.

Android frontend

Get the code!

Source code for the app is available in the wireguard-android repository.

Motivation

WireGuard was designed from the start for seamless roaming, and does not need to transmit any packets to maintain a "connection" when not in use. Both of these traits make it great for a mobile VPN solution, but there was not yet integration for any mobile OS. While the WireGuard kernel module would have to be ported to individual Android devices, another GSoC student wrote a userspace implementation this summer, which will work on any device and function without root access.

Overview

In its current state, the app allows creating and editing WireGuard VPN configurations, enabling and disabling them, as well as renaming and deleting them. All of the different attributes used by the kernel module (through wg(8)) as well as wg-quick(8) are supported, with the exception of FwMark, as it is used internally by Android.

Interfaces can be enabled or disabled from within the app; multiple configurations can be enabled simultaneously, as long as they use different ports for the underlying UDP socket (ListenPort). Additionally, on devices running Android 7.0 Nougat or newer, one can be controlled via a custom quick settings tile.

The app supports all devices running Android 5.0 Lollipop or newer, though the WireGuard module must be available, wg-quick must be installed, and the app must have root access so it can run wg-quick. The app uses material design, and supports both phone and tablet layouts. Keypairs can be generated from within the app, and the public key can be easily exported to the device's clipboard.

Current status

While the app is functional, there is a lot of work left to do. The app was developed and tested on the OnePlus 3/3T running SultanXDA's build of LineageOS 14.1. It should work on other phones and firmwares where root access and the WireGuard module are available. The WireGuard module is compatible with kernels back to Linux 3.10.

Next steps

  • Allow importing a configuration from a file or QR code. This would replace the "add" button with a floating action menu. This is already implemented in the fab branch, but has not been merged yet due to some licensing issues about which FAB implementation to use.
  • Allow comments in config files. This is only relevant for imported configurations.
  • Validate more config attributes (IPs, endpoints, etc.). Currently, only the interface name and private key are fully checked (though other fields have character type or length restrictions).
  • Add an optional notification when a config is enabled. This would be especially useful on older devices where the quick settings tile is not available.
  • Use a switch to toggle configs within the app. This requires implementing a custom View based on the Switch, because it is not possible to hook into the existing switch to programmatically control its state when touched.
  • More robust state checking and error reporting. Currently, the app does not provide feedback when enabling a configuration fails, beyond it not transitioning to the "enabled" state. This can happen when the UDP port WireGuard listens on is reused, or there was bad syntax within a configuration attribute.
  • Show runtime status (uptime, transfer stats) on config detail screen. Basically, copy the relevant parts of what wg(8) shows for a running interface.
  • Support calling wg(8) directly (instead of wg-quick(8)) and the userspace Go implementation.

Educational highlights

  • The Android VpnService framework does not work at all with kernel-space VPNs. I spent the better part of a week crawling through layers upon layers of AOSP code, and there's simply no way around using a TAP device with it. Fortunately, this is fine for the userspace implementation, and the kernelspace implementation would have required root access anyway.
  • Fragments and "responsive" layouts are hard. I rewrote the state machine for the main Activity at least five times, each time thinking "I finally got it working in all cases" before finding some sequence of actions that was broken due to my app's view of its state not matching the Android system's view.
  • Don't put an EditText inside of a ListView. Ever. If you don't trust me, ask all of these poor souls who tried it. The recommended solution is actually to reimplement most of the ListView yourself. Fortunately, this was not much more difficult than writing the ListAdapter to I had already written to integrate a ListView with data binding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment