I've spent way too much time building an Elixir-wrapper for
libtailscale
and an accompanying
chat app, because Elixir and
libtailscale
seem meant for each other.
I've been following Tailscale for a while now. Obviously, they're darlings of HN, their business strategy appeals to me and the product is great. I'm using it both for my personal servers and for managing our servers at Leaf and I'm a very happy customer.
Upon initially learning about Tailscale, I thought they had some cool technology, but I wasn't sure what the benefits were over just using a private network on my Hetzner hosts. That changed when, almost two years ago, Xe posted a talk and accompanying blog post titled The Subtle Magic of tsnet about how Tailscale allowed you to write services that exposed themselves directly to your Tailnet (a Tailscale network, colloquially) with authentication and authorization built in. How cool is it, to be able to write internal tooling that never has to require a login, and can automatically determine your permissions based on your Tailscale connection. There are other articles that say the same thing, and a list of community projects which all expose themselves directly on a Tailnet.
While Tailscale is mostly built in Go, and there has long been a
tsnet
library that allowed users to
build Go-applications directly exposing themselves to a Tailnet, Xe's post was
the first time I was introduced to
libtailscale
, a C-library that
allows other languages to do the same. Ever since I read that post, (which
coincidentally came out right around the time I was learning Elixir), I've
dreamt of a way to use libtailscale
from Elixir. And now, finally, that is
possible.
After mulling it over for a long time, I finally decided to give writing an
Elixir wrapper for libtailscale
a go about a month ago. The initial steps
were easy enough, I've written NIFs
before. I pretty quickly had an
example library that I could interact with, but it depended on NIF-calls for
everything, even accepting, reading and sending, which means that those
operations would block the BEAM virtual machine.
That's not the Elixir way, so I started trying to figure out how to turn the raw
file descriptors that libtailscale
provides into something more resembling
gen_tcp
. This turned
out to be a rabbit-hole of enormous proportions, probably greatly exacerbated by
my own naivety. I'm hoping that, by writing all this down, someone is able to
tell me that there's a much simpler way to do this.
Firstly, while
gen_tcp:listen/2
has an fd
option that in principle can be used to give it a raw
file-descriptor, libtailscale
seems to use some custom
logic
to accept new connections on a listener, so gen_tcp:listen/2
didn't
work. Secondly, it is also possible to overwrite the TCP module that gen_tcp
uses, but there's a predefined list of modules that are allowed, so I couldn't
supply my own accept-logic. Finally, for a while I thought I could write a
simple wrapper around a
socket
(which has an
open/1
for raw FDs), but
ThousandIsland.Transport
,
which I needed to implement to make my Phoenix app talk directly to the
Tailscale network, expects a lot of additional handling like active and passive
mode
connections.
In the end, I ended up doing horrific hacks in a copy of the original
gen_tcp
and
gen_tcp_socket
modules. The result is not pretty (I've not removed anything from the ~4000
lines of code, mostly just added extra state where needed) and the connect-parts
are entirely untouched/unimplemented, but it does the job. After a month of
struggling, I was able to implement
gen_tailscale
, a gen_tcp
-like
library that talks directly to my Tailnet. There are probably many broken paths
through the code and it sorely needs to be rewritten from the ground up (please
reach out if you're interested in helping with this), but it works well enough
that I was able to implement
TailscaleTransport
, a
ThousandIsland.Transport
implementation on top of it.
Now, I had a Phoenix application that was running directly on my Tailnet. The next piece of the puzzle was authentication.
All those nice examples I mentioned earlier use the fact that a Tailscale
connection can be uniquely identified with a Tailscale user. Therefore, the
tsnet
apps were able to authenticate and authorize users just based on their
connection, removing the need for logins or other kinds of
account/profile-handling.
The way this works is that tsnet
/libtailscale
has a getremoteaddr
function, which takes a connection and returns the Tailscale IP address for that
connection. Then, Tailscale includes
LocalAPI
, an
underdocumented API that allows you to, among other things, query for
information about a given Tailscale IP address. Therefore, I had to go back to
libtailscale
and gen_tailscale
in order to allow calling getremoteaddr
and
setting up the loopback server for LocalAPI
. This entailed more hacking on
gen_tailscale
, because to run getremoteaddr
, you have to supply both the
initial Tailscale server FD, the listener FD and the connected socket FD. In
the ideal world, they all belong in separate processes in
gen_tailscale_socket
, but now they're being carried around everywhere.
Finally, I added
TailscaleTransport.Plug
,
which adds information about the Tailscale connection to the connection assigns,
which can then be accessed from your Phoenix
application.
The result is three libraries and an example app for Elixir:
Libtailscale
, a thin NIF-wrapper aroundlibtailscale
.gen_tailscale
, agen_tcp
-like wrapper aroundLibtailscale
.TailscaleTransport
, aThousandIsland.Transport
implementation, that can be used in Phoenix/Bandit apps to expose applications directly on a Tailnet.tschat
, an ephemeral chat-client that exposes itself over a Tailscale network and authenticates users based on their connection.
Everything in this chain of packages should be considered proof of concept at
this point and should not be used for anything important. Especially
gen_tailscale
, which has been constructed by crudely hacking the original
gen_tcp
module to use libtailscale
and could use a total rewrite at some
point. However, it works well enough that my example application
tschat
is able to accept connections
from different Tailscale users and show their username by retrieving data from
the Tailscale connection.
Where to from here?
I'd like to expand TailscaleTransport
to be able to serve HTTPS connections
over Tailscale, using Tailscale's HTTPS
certificates. I think it's
possible to extract the necessary certificates from the LocalAPI.
It's also obvious that gen_tailscale
should be either be cleaned up to remove
remnants of the old gen_tcp
that are no longer needed and to fix the
connect
-related functionality or completely rewritten from scratch. Currently,
a process is spawned for each listener and connected socket, but e.g. the
Tailscale server FD should probably live in its own process, with a proper
supervisor tree around it. It probably all crashes pretty badly if the
Tailscale server suddenly stops running.
Figure out how robust the transport is and how efficient it is. How does the
performance of a TailscaleTransport
-using Phoenix-application compare to a
Go-project using tsnet
?
Start building some useful tools using TailscaleTransport
. Fully explore
application
capabilities.
Do you have any ideas? Would you like to help? Have you tried out tsnet or one of the libraries and found them to be utter trash? Please reach out, I'd love to hear from you.