smaeul/report.md

## report.md

      
    Raw
  

              report.md
            
          
    Overview

The project I worked on this summer was divided into two major parts. First, I optimized the
cryptography parallelism of the in-kernel WireGuard implementation. My goal was to improve
overall speed, increase multi-core scalability, and fix some hard-to-reproduce race conditions.
I spent the second half of the summer writing an Android application that serves as a convenient
GUI frontend for WireGuard. My vision for the app was to be simple, yet very functional.
Queueing improvements

Get the code!

The work I did for this part of the project is available in the sh/queues-5-for-jason
branch of the official repository.
Motivation

We were unsatisfied with the current padata-based implementation, due to some of its limitations,
like a hard maximum on the number of outstanding work items, conflicts with softirq processing
during work functions, reliance on a timer to work around race conditions, and a still-mysterious
race condition that caused list/memory corruption and had been affecting one of my machines.
Overview

The design uses a pair of queues (really, sets of queues) to enable parallel encryption of packet
data, while ensuring that packets are always kept in order. One set of queues is per peer. Packets
are added to this queue as soon as they are received from userspace, and they are kept in this queue
until they are passed to the underlying physical network interface.
Once added to the per-peer queue, packets are added round-robin to a per-CPU queue to be encrypted.
These per-CPU queues are shared among all peers, since the CPU-intensive encryption process can work
on packets in any order. Having shared queues for the "slow" part also prevents once peer from
starving any others out of CPU time.
Once encryption is finished, the packet is marked with a flag (CTX_FINISHED) allowing it to be
transmitted. The per-peer transmission function takes all marked packets from the front of the
queue, dequeues them, and sends them out together. An equivalent process happens on the receiving
side for decryption and forwarding packets to userspace.
Because both queues are single-reader, multiple-writer, I was able to implement them with simple
single-cmpxchg loops (found in src/queue.h) instead of full spinlocks. This helps maximize the
multi-core scalability of the data pipeline. Additionally, due to better integration with other
parts of the WireGuard codebase, the three-stage send pipeline is reduced to two stages in the vast
majority of cases; a separate step associating packets with a keypair only needs to happen when
waiting for a handshake to complete.
Since I was able to remove the dependency on padata, my total contribution is about 850 lines of
net reduction in code size. Excluding changes to the compatibility layer, however, I traded a 1000
line dependency for under a hundred lines in WireGuard itself, while at the same time improving
performance.
src/config.c  |   2 +-
src/data.c    | 432 ++++++++++++++++++++++++++++++----------------------------
src/device.c  |  46 +++----
src/device.h  |  12 +-
src/main.c    |  12 +-
src/packets.h |  21 +--
src/peer.c    |  28 +++-
src/peer.h    |   9 +-
src/queue.h   | 139 +++++++++++++++++++
src/receive.c |   5 +-
src/send.c    |  96 ++-----------
src/timers.c  |   4 +-
12 files changed, 445 insertions(+), 361 deletions(-)

Current status

Due to a busy schedule at the end of the summer, Jason, the WireGuard maintainer, did not have an
opportunity to fully review and merge my changes before the end of GSoC. However, the changes should
be merged soon.
Next steps


Integration of the BQL (byte queue limits) queue management system. This is a library provided to
help prevent queues from becoming too large, and dynamically resizing them based on observed
latency. This would track the total number of bytes held by WireGuard in its queues on all cores.
It also allows integration with fq_codel to ensure bufferbloat does not become a problem.

We already started work on this, but wanted to wait until the main changes were merged before
putting too much time into it.


Removal of data.c. All of its functions can and should be moved to send.c or receive.c, now
that infrastructure for working with padata is removed.

Again, I have already done this, but it prevents rebasing, so it is best left until the main
changes are merged.


Some form of GRO (Generic receive offload). Right now, groups of packets up to 64KiB long are
encrypted together when sent in one send(2) system call from userspace. This minimizes queue
lengths and setup/teardown costs of the cryptography functions. However, packets are received one
by one, so the receive side has more overhead and much longer queue lengths. GRO lets us group
packets for the same peer as they are received.

Unfortunately, the normal GRO path does not preserve the original packet length, which
WireGuard needs in order to find the end of the encrypted data. Thus using it for WireGuard
involves "stealing" packets from the networking subsystem and keeping our own list per keypair.


Evaluate other methods of dividing encryption/decryption work among CPUs, besides round robin.

Educational highlights


All queues must have size limits, especially when the readers and writers compete for CPU time,
because unbounded growth can easily consume all available RAM and bring down the system. I spent
a while investigating an out-of-memory condition I thought was due to RCU grace periods that
ended up just being due to the accidental removal of a queue length limit.
Multithreaded programming in kernelspace is surprisingly easy, considering the three levels of
preemption and the rules around them. Lockups are also quite easy, and the built-in kernel
debugging tools are integral to finding and fixing them.

Android frontend

Get the code!

Source code for the app is available in the wireguard-android
repository.
Motivation

WireGuard was designed from the start for seamless roaming, and does not need to transmit any
packets to maintain a "connection" when not in use. Both of these traits make it great for a mobile
VPN solution, but there was not yet integration for any mobile OS. While the WireGuard kernel module
would have to be ported to individual Android devices, another GSoC student wrote a userspace
implementation this summer, which will work on any device and function without root access.
Overview

In its current state, the app allows creating and editing WireGuard VPN configurations, enabling and
disabling them, as well as renaming and deleting them. All of the different attributes used by the
kernel module (through wg(8)) as well as wg-quick(8) are supported, with the exception of
FwMark, as it is used internally by Android.
Interfaces can be enabled or disabled from within the app; multiple configurations can be enabled
simultaneously, as long as they use different ports for the underlying UDP socket (ListenPort).
Additionally, on devices running Android 7.0 Nougat or newer, one can be controlled via a custom
quick settings tile.
The app supports all devices running Android 5.0 Lollipop or newer, though the WireGuard module
must be available, wg-quick must be installed,
and the app must have root access so it can run wg-quick. The app uses material design, and supports
both phone and tablet layouts. Keypairs can be generated from within the app, and the public key can
be easily exported to the device's clipboard.
Current status

While the app is functional, there is a lot of work left to do. The app was developed and tested on
the OnePlus 3/3T running SultanXDA's build of LineageOS
14.1. It should work on other phones and
firmwares where root access and the WireGuard module are available. The WireGuard module is
compatible with kernels back to Linux 3.10.
Next steps


Allow importing a configuration from a file or QR code. This would replace the "add" button with
a floating action menu. This is already implemented in the fab branch, but has not been merged
yet due to some licensing issues about which FAB implementation to use.
Allow comments in config files. This is only relevant for imported configurations.
Validate more config attributes (IPs, endpoints, etc.). Currently, only the interface name and
private key are fully checked (though other fields have character type or length restrictions).
Add an optional notification when a config is enabled. This would be especially useful on older
devices where the quick settings tile is not available.
Use a switch to toggle configs within the app. This requires implementing a custom View based on
the Switch, because it is not possible to hook into the existing switch to programmatically
control its state when touched.
More robust state checking and error reporting. Currently, the app does not provide feedback when
enabling a configuration fails, beyond it not transitioning to the "enabled" state. This can
happen when the UDP port WireGuard listens on is reused, or there was bad syntax within a
configuration attribute.
Show runtime status (uptime, transfer stats) on config detail screen. Basically, copy the
relevant parts of what wg(8) shows for a running interface.
Support calling wg(8) directly (instead of wg-quick(8)) and the userspace Go implementation.

Educational highlights


The Android VpnService framework does not work at all with kernel-space VPNs. I spent the
better part of a week crawling through layers upon layers of AOSP code, and there's simply no way
around using a TAP device with it. Fortunately, this is fine for the userspace implementation,
and the kernelspace implementation would have required root access anyway.
Fragments and "responsive" layouts are hard. I rewrote the state machine for the main Activity
at least five times, each time thinking "I finally got it working in all cases" before finding
some sequence of actions that was broken due to my app's view of its state not matching the
Android system's view.
Don't put an EditText inside of a ListView. Ever. If you don't trust me, ask all of these poor
souls who tried it. The
recommended solution is actually to reimplement most of the ListView yourself. Fortunately, this
was not much more difficult than writing the ListAdapter to I had already written to integrate a
ListView with data binding.