Skip to content

Instantly share code, notes, and snippets.

@grahamwhaley
Last active April 27, 2018 14:59
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save grahamwhaley/03d818bc5d20dcbab59cbf897d9375bf to your computer and use it in GitHub Desktop.
Save grahamwhaley/03d818bc5d20dcbab59cbf897d9375bf to your computer and use it in GitHub Desktop.
Setting up vsock/nfs

How to set up vsock/nfs and QEMU

Wed 25 Apr 16:15:47 BST 2018

Setting up the experimental vsock/nfs between a host Linux machine and a KVM/QEMU client is not quite trivial - let's write it down...

Overview

over in kata containers, we use 9pfs to mount host side filesystems into the QEMU/KVM Virtual Machine.

Whilst that works pretty well, it does have a few limitations. We are always on the look out for alternatives. One of those is to use a different network filesystem - say, NFS.

A key to our use of 9pfs is that we mount it through virtio, which is an efficient transport between the host and the VM. Interestingly, there is an experimental implementation of NFS over vsock (rather than going over a 'full' network stack).

Let's spin that up and have a peek. Now, 'spinning that up' is not quite trivial, so here are some of the nitty gritty details...

Note, throughout this document we will refer to the 'host' and the 'client'. The host is your bare metal host machine. The client in this document is a QEMU/KVM Virtual machine - ideally running the same or very similar distro to your host (in that way we can then re-use some binaries and not have to rebuild stuff twice...)

Host machine

For my test I ran up a new clean Fedora 27 desktop install. I didn't want to test this out on my main machines, as you need to patch and install a host side kernel.

If you are looking at doing some measurements on the final installation, then maybe it would be best to install a server version of your chosen distro, to avoid all the overheads of having a GUI running in the background.

Installing tools

Once you have your distro installed, you will need the development tools packages installed - everything you need to build a kernel and an autotools based package. I'm sure you can figure that out ;-), but I partly cheated by using some of the info on this Fedora custom kernel page.

Get the kernel

We are going to need a kernel to patch. I successfully applied the patch set to a 4.13.16 kernel, which also happened to align fairly closely with one of my F27 kernels.

I followed the instructions on that above Fedora kernel page to:

  • Grab the upstream vanilla kernel:
    • git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  • Lay the stable tree over that:
    • git remote add -f stable git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
  • And check out my chosen version:
    • git checkout -b vsock_v4.13.16 v4.13.16

Patch the kernel

The patches we need are from the series from Stefan Hajnoczi. I took mine from the kernel patchwork for the linux-nfs list here There are 14 patches, and I could not see a nice way to grab them all as a ball from the patchwork (which is a shame). I bashed up some nasty little wget script to grab them. Alternatively, possibly, you can potentially just use the pre-patched kernel tree direct from Stefans github here. Note though, I used the patchwork patches and the stable tree kernel, so I cannot verify the other route will work with these instructions.

Then I applied the patches with a git am -3 *.patch... if you are lucky, all will apply cleanly.

configure the kernel

Now, we need a kernel config file. As my chosen kernel version was very close to one I had installed, I copied the relevant config file from /boot to my kernel tree as .config and did a make oldconfig.

We are flying a little on 'luck' here - there are a few kernel configs we need to have set to get vsock-nfs up and running - and one of those at least is a new one added by the patchset... Luckily for me, with the F27 config I took, it looks like that new config option is set correctly by default. You might want to check your .config file after your make oldconfig and ensure you have:

  • CONFIG_VSOCKETS=m
  • CONFIG_VIRTIO_VSOCKETS=m
  • CONFIG_VIRTIO_VSOCKETS_COMMON=m
  • CONFIG_SUNRPC_XPRT_VSOCK=y
  • CONFIG_VHOST_VSOCK=m

You will also want to edit the EXTRAVERSION field at the top of the kernel Makefile so you can easily identify your newly built kernel - go ahead and add some unique string like -vsock or similar...

And then - build the kernel. Instructions are on that Fedora page (pretty much, a make bzImage followed by a make modules).

Heh, I guess I did not have the most powerful machine - maybe you want to go make a cup of tea at this point, or have lunch, or walk the dogs, or do all those emails you ignored this morning...

Install the kernel

Now you want to install that kernel in your host. You will find on that Fedora page (I told you it would be useful....):

  • sudo make modules_install
  • sudo make install

That should install the relevant files into /lib/modules and /boot. Now, go ahead and reboot your host, jump into that grub menu pretty sharpish, and choose your new kernel. Fingers crossed, your host comes back up and everything works (unlike for me the first time when I did a build of the Fedora source tree - which booted, but a bunch of stuff didn't work - like my network???).

Setting up QEMU

OK, now you have a patched host - great. Next step is to get a patched guest...

Here we use QEMU/KVM to get our VM up and running.

First, let's create a client image. I chose a suitably large image size and used the Fedora server iso for the install (predominantly as it installs faster than the fatter desktop version):

#!/bin/bash

set -x

IMAGE=/${HOME}/images/Fedora-Server-dvd-x86_64-27-1.6.iso

qemu-img create -f qcow2 f27.img 20G

qemu-system-x86_64 -m 4G -smp 4 -hda f27.img -cdrom ${IMAGE} -boot d -enable-kvm -show-cursor -cpu host

and then go through the normal install procedure.

Then I boot that image with:

#!/bin/bash

set -x

IMAGE=/${HOME}/vsocknfs/f27.img

sudo /usr/bin/qemu-system-x86_64 \
-cpu host -smp 4 -m 4G \
-enable-kvm \
-device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3 \
-virtfs local,path=9p,mount_tag=host0,security_model=mapped,id=host0 \
-hda ${IMAGE} \
-boot c \
-nographic \
-vnc :0

You may want to tweak some of that to your personal system, such as:

  • the path to the IMAGE
  • the RAM size and number of cores you pass in
  • the 9p path - is a path to a dir on my machine I share between the host and client over 9p.

You may also wish to change/choose how you connect to your QEMU guest - I used vnc, and that worked well enough.

Get the kernel into QEMU

Oh, first you will want to connect to your QEMU ;-). I used:

#!/bin/bash

vncviewer :0

Once you have booted your guest, you will need to mount up that 9p shared folder. I used:

#!/bin/bash
sudo mkdir /mnt/my9p
sudo mount -t 9p -o virtio,version=9p2000.L host0 /mnt/my9p

and then a mount should show you a 9p fs mounted on /mnt/9p.

Then, over on the host, I was lazy and did a cp -r of my built linux git tree into my host side 9p folder (sure, you could share the tree directly and not have to copy).

Then, on your guest, navigate to that linux kernel tree on the 9p mount point, and do the same make modules_install and make install you did on the host.

Reboot the guest (well, OK, reboot did not work for me, so maybe just shutdown and then boot again) - and quickly connect your vnc so you can choose which kernel to boot in the grub menu. Sure, you can and maybe should go and make sure that your new kernel is the default boot in grub - I just didn't bother (yet) whilst testing...

Boot your shiny new kernel, and check you are running it in the guest with a uname -a. Hooray - this is a major step - you now have a patched kernel in both the host and guest!

nfs-utils

Now we need a set of nfs utils, on both the host and guest, that support the new feature.

Go get the code from https://github.com/stefanha/nfs-utils/tree/vsock-nfsd - and NOTE the branch.

Follow the README in that directory to do the autogen, configure and then build. Note, I had to install a whole bunch of extra lib-dev dependancies to get this to build... and, on F27 it did not build without adding an

#include <stdint.h>

into support/export/hostname.c. If you see 'UINT32_MAX' undeclared or similar, that is probably the problem.

Create the nfs mount point on the host

We need to set up an NFS export on the host side that we will then publish for the guest to mount. Edit /etc/exports, and add a line like:

/home/test	vsock:*(rw,no_root_squash,insecure,subtree_check)

Invoking nfsd on the host

To get the nfsd up on the host, we use a startup script from the linux vsock-nfsd repo, which can be seen at https://github.com/stefanha/linux/blob/vsock-nfsd/go.sh

note: we have a slightly modified version of this, that does not start up the gdb debug at the end afaict

NOTE: also note, you need to do this with no virtio devices in play - which in our case means you should do this with the VM not running. If you forget, you will see module insert errors due to the system default virtio modules already being in use - damhikt.

Invoke that script with:

./go.sh nfs_vsock

You should see a bunch of activity, and no errors...

vhost mount in client

Now, back to the client - and you should be able to execute the following in order to mount that host side nfs share:

#!/bin/bash
sudo mkdir /mnt/test
sudo /mnt/my9p/nfs-utils/utils/mount/mount.nfs 2:/home/test /mnt/test -o clienaddr=3,proto=vsock

selinux?

Aha - gotcha! At this point, on my F27 desktop install, SELinux reared its head and blocked me. I disabled it (temporarily) on the host (you can go look that up yourself...).

And test...

And now, if you run a mount, hopefully you can see you have both the 9p and the vsock-nfs mount points mounted in your guest. Phew. Well done. Time for another cup of tea then...

Appendix

filebench doc why nfs sucks doc

#!/bin/bash
pandoc -f markdown_github -t html -o README-vsock-nfs.html README-vsock-nfs.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment