Skip to content

Instantly share code, notes, and snippets.

Forked from dghubble/
Created June 12, 2017 13:27
Show Gist options
  • Save ateleshev/6caef0acc52b9259a279d573c923ff3e to your computer and use it in GitHub Desktop.
Save ateleshev/6caef0acc52b9259a279d573c923ff3e to your computer and use it in GitHub Desktop.
Running QEMU/KVM and Nested Kubernetes on Bare-Metal Kubernetes

%title: Kubeception %author: @dghubble

-> Kubeception

-> Experiments with QEMU/KVM on Kubernetes <-

-> Dalton Hubble <- -> @dghubble <-


  • QEMU is an open-source machine emulator and virtualizer
  • Combined with KVM, it runs virtual machines with almost natve speeds
  • KVM (kernel-based Virtual Machines) is a kernel feature and kernel module
  • Exposes /dev/kvm interface so userspace programs can use processor virtualization features

-> Using QEMU/KVM

Typically you'd run QEMU/KVM VMs on a Linux host (laptop, CI, etc.)

  • Container Linux docs on running under QEMU/KVM VMs.
  • Testing CoreOS matchbox

You can also run QEMU/KVM VMs on a bare-metal Kubernetes cluster...

-> DEMO: QEMU/KVM in Alpine

Run a privileged alpine container on a bare-metal Kubernetes cluster.

kubectl create -f alpine/deployment.yaml

Snippet from deployment.yaml

  - name: alpine
    image: alpine:3.5
      privileged: true
      - sh
      - -c
      - "echo Hello; sleep 36000"

-> Privileged

The privileged securityContext maps to the docker privileged flag, which is a mode to allow a pod to access the host's device files.

kubectl exec -it alpine-12345 /bin/ash

Look at devices files and find /dev/kvm is available.

-> Install QEMU

Let's install qemu-system-x86_64 and a few dependencies,

apk add --update qemu-system-x86_64 bzip2 wget

-> Launch a VM

Download a Container Linux image.


Decompress the bz2 image.

bzip2 -d coreos_production_qemu_image.img.bz2

Start a QEMU/KVM instance.

qemu-system-x86_64 -m 1024 -enable-kvm -hda coreos_production_qemu_image.img -nographic

-> Container Linux "Image"

Build and publish a container image for Container Linux.


  • Install QEMU/KVM
  • Download a Container Linux image
  • Add any tools or utilities


  • Setup desired features for your guest VM
  • Resize image to a desired disk size
  • Launch QEMU/KVM VM with desired cpu/memory

-> Tips: Networking

QEMU has a hostfwd option which forwards local ports to guest ports.

           Redirect incoming TCP or UDP connections to the host port hostport to the guest IP
           address guestaddr on guest port guestport.

For example, hostfwd=tcp::2222-:22 will allow you to SSH from host to guest.

ssh -p 2222 localhost

-> Provisioning: Container Linux Config

Container Linux accepts Container Linux Configs (indirectly).

  • Declarative YAML file
  • Provisions disks during early boot
    • Create partitions
    • Write files (systemd units, networkd units, configs)
    • Configure users
  • Caveat: Convert to machine-readable Ignition first

-> Example 1

Add an SSH public key for user "core".

    - name: core
        - "ssh-rsa blah"

-> Example 2

Systemd should run the etcd2.service.

    - name: etcd2.service
      enable: true
        - name: 40-etcd-cluster.conf
          contents: |

-> Tips: QEMU Firmware Config

QEMU has a fw_cfg option which allows a file to be passed to the guest.

fw_cfg [name=]name,file=file
       Add named fw_cfg entry with contents from file file. The fw_cfg entries are passed
       by QEMU through to the guest.

Container Linux can read from the QEMU firmware config device to get user-data.

-fw_cfg name=opt/com.coreos/config,file="${PWD}/ignition.ign" "$@"


Container accepts a Container Linux config, convert to Ignition. Pass into guest via fw_cfg to configure the VM.

./ct -in-file $CONTAINER_LINUX_CONFIG_FILE -out-file ${PWD}/ignition.ign

-> coreos-kvm

Nightly Jenkins pipeline publishes


Environment Variables

  • CONFIG_FILE - provide a Container Linux Config
  • IGNITION_CONFIG_FILE - provide a raw Ignition Config
  • CLOUD_CONFIG_FILE - provide a Cloud-Config
  • VM_NAME - name of the VM
  • VM_MEMORY - amount of VM RAW (4G)
  • VM_DISK_SIZE - size of VM disk (12G)
  • HOSTFWD - port forwards (hostfwd=tcp::2222-:22)

-> DEMO: coreos-kvm

Create a "VM pod" with user-data in a ConfigMap.

kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml

Access the Container Linux VM via the service's cluster IP.

kubectl get service coreos-kvm
ssh core@10.3.0.X

                           |           |
Service in     |  Service  |   10.3.0.X:22
                           |           |
                    |        Endpoints        | 10.2.0.X:2222
Pod in               |
                    |   coreos-kvm container  | local
                    |         "host"          | port forwards to
                    |  +-------------------+  | guest :22
                    |  |  Container Linux  |  |
                    |  |   QEMU/KVM guest  |  |
                    |  |                   |  |
                    |  +-------------------+  |

-> Applications

  • Jenkins executors/workers
  • Docker builds in a clean Container Linux env
  • Arbitrary VMs (QEMU can run almost anything)

-> Kubernetes in a VM

Goal: Single node Kubernetes

  • Write a Kubernetes deployment for a Container Linux QEMU/KVM VM
  • Write a Kubernetes configmap with a Container Linux Config
  • Write a Kubernetes service exposing 22 and 443
  • Add a DNS record resolving to the apiserver (for kubectl)

-> Demo

Create the configmap, deployment, and service.

cd k8s
kubectl create -f configmap.yaml
kubectl create -f deployment.yaml
kubectl create -f service.yaml

Let's take a look at what we've created.

-> Deployment

  • Mounts the Container Linux Config
  • Adds port forwards from 2222 to 22 and 1443 to 443 from host to guest
  - name: HOSTFWD
    value: "hostfwd=tcp::2222-:22,hostfwd=tcp::1443-:443"
  - name: CONFIG_FILE
    value: /userdata/config.yaml
  - name: apiserver
    containerPort: 1443
  - name: ssh
    containerPort: 2222
  - name: config-volume
    mountPath: /userdata

-> Service

  • Expose pod ports 2222 and 1443
  • Assign a fixed service IP (hacky).
kind: Service
  name: coreos-k8s
    name: coreos-k8s
    - name: ssh
      port: 22
      targetPort: 2222
    - name: api
      port: 443
      targetPort: 1443

-> DNS, TLS, ConfigMap

Add a DNS record resolving to the service IP.

$ dig

Generate TLS certificates

./k8s-certgen -s \
  -m IP.1=,

Write a Container Linux Config and place it in a Kubernetes ConfigMap.

  • Add systemd units for etcd, flanneld, and kubelet
  • Add TLS certificates (hacky: should be mouted as secrets into pod and then into guest)
  • Just modify matchbox examples

-> Fingers Crossed

Show the pod running the Container Linux VM.

kubectl get pods
kubectl get service coreos-k8s

Show that the pod is running a single-node Kubernetes inside.

kubectl get nodes
kubectl get pods --all-namespaces

Let's make it more weird?

kubectl scale deployment coreos-k8s --replicas=3

-> Back to Reality


  • Develop and test federated Kubernetes
  • Provide a (nested) Kubernetes to each developer

Pros and Cons

  • Each "VM pod" is running qemu-system-x86 inside, baked into the image
  • Image must provide the features
    • Customizable cpu, memory, and disk size
    • Providing Container Linux configs to guest
    • Mounting volumes into guests from Kubernetes
    • Snapshots, migrations, etc.


  • rkt has an alternative stage 1 which can use QEMU/KVM
  • kubevirt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment