I've been reading Kubernetes Up and Running, and got excited when I saw that they include some instructions for configuring a cluster on Raspberry Pis, which I've been dying to play with for some time but never had a project to dig into.
I had been recording my progress in the form of a twitter thread, but I thought something easier to scan, and editable, might be a better way to record the outcomes and lessons learned.
As I started to dig in, I found the short guide to be different enough what I was seeing (in reality) that I decided to keep some notes about the tweaks I was required to make.
In this document will try to use block quotes (like this one) to indicate subjective observations. These blocks could be labeled as hunches, or even suspicions. In other cases, I'll use these blocks to indicate contradictions, or things that don't work as advertised.
This is just a quick list of what is working and what isn't. These topics are expanded upon in the sections that follow.
- Cluster leader has 2 network interfaces, one public (wlan0), one private (eth0).
- Cluster leader is providing other nodes on the private subnet with addresses via dhcp.
- All cluster nodes are able to reach each other over their private subnet and switch.
Cluster leader is able to reach the internet, but other nodes are not despite listing the leader as their gateway.
The general layout of the cluster outlined in the book has the leader configured with 2 network interfaces:
- wlan0 this is the public interface that you'd normally connect to when using kubectl, for example. This interface would be connected to your regular wifi access point, so it'll be on your main network (with internet access).
- eth0 the wired interface would be connected to a switch and would offer dhcp for a private cluster subnet.
The cluster leader would then enable something called IP Forwarding which should allow traffic on eth0 to reach addresses outside of the private cluster subnet (ie, for public internet access).
As I understand it, the other nodes in the cluster would connect their wired interfaces to the switch, and skip wifi entirely. With the cluster leader listed as their gateway (in the network config fed to them via dhcp from the leader) they should be able to route from their eth0, through to the cluster leader's eth0, which should then forward through to the leader's wlan0.
Unfortunately this isn't quite working for me yet. The leader is able to reach public addresses, but the other nodes are not.Fixed by finding better iptables rules than were shown in the book.
This was the first snag I ran into. In the book they describe configuring your wifi connection and hostname
for the cluster leader by editing a file called /boot/device-init.yaml
which does not exist in the
current (as of writing) version of the OS image linked to by the book.
Looks like this change was introduced in the last two releases, and while I'm still entertaining the idea of downgrading to bring the book's instructions back into working order, these notes are assuming we are using v1.7.1 which is the latest as of writing this. If enough time has passed, the latest might be some other version. See the releases list to determine this.
The old way of configuring these seemed to involve plugging in some basic keys and values into a yaml file. The new way, built on a very specific version (v0.7.9) of cloud-init, is not nearly as simple.
The cloud-init way is to edit a file called /boot/user-data
which is also a yaml file, but includes
blocks which are essentially shell scripts, so you have to watch how you format them.
I can appreciate that this is more extensible in that you can effectively do whatever you want,
but it's a little ugly and clumsy to someone who doesn't know exactly what they need (like me).
A point of confusion that remains for me is around how cloud-init only seems to run parts of what
is found in the /boot/user-data
with each boot, and much of it is one-time on first boot
execution. This means, if you image your Pi, and boot it up so you can edit the file (to configure your
wifi, for example), you've already missed your opportinuty for cloud-init to create and wire up the
conf files you'll need to edit - you'll have to do some of the automation by hand.
I'm not sure how to force cloud-init to re-run itself as if we're booting for the first time, which
might smooth this over. Lacking the know-how on this means I'm left with 2 options, either make all my
edits to /boot/user-data
before first boot by mounting the sd card after imaging, or run the conf
creation by hand by following the script blocks in the file.
The leader effectively acts as a router in front of the rest of the cluster, so it has some extra configuration that is not required on the rest of the nodes.
Here's an example of the (commented out by default) wifi config:
# # WiFi connect to HotSpot
# # - use `wpa_passphrase SSID PASSWORD` to encrypt the psk
write_files:
- content: |
allow-hotplug wlan0
iface wlan0 inet dhcp
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
iface default inet dhcp
wireless-power off
path: /etc/network/interfaces.d/wlan0
- content: |
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1
network={
ssid="YOUR_WIFI_SSID"
psk="YOUR_WIFI_PASSPHRASE"
}
path: /etc/wpa_supplicant/wpa_supplicant.conf
# These commands will be ran once on first boot only
runcmd:
# Pickup the hostname changes
- 'systemctl restart avahi-daemon'
# # Activate WiFi interface
- 'ifup wlan0'
You'll notice the line just above the wifi config that reads:
use `wpa_passphrase SSID PASSWORD` to encrypt the psk
This didn't actually work for me, perhaps because of the way my home router is configured, and I had to use a plain text passphrase in this config instead of an encrypted one.
This advice to encrypt the passphrase was actually a little frustrating considering the fact
you'd need to be on a system with this binary to be able to run it. If you were running
linux on whatever computer you're doing this work on, you might have access to this, but from
a mac or windows, you might need to ssh into your Pi to be able to run it, which would mean you'd miss
your first boot then have to go edit the wlan0
conf after the fact to update it with the newly
encrypted passphrase.
Much of what you'll see in this example config is what you'd find commented out by default, however I
noticed that on the rpi3 I have, the wlan0 interface will turn itself off after some time. I added
the line reading wireless-power off
to prevent the power management system from shutting down the
wlan0 interface, which is meant to act as the link to the external network for our cluster.
It's also important to note that the write_files
section responsible for creating the wlan0
configs is where most of your customizations are made, but you'll also have to uncomment the very
last line (under the runcmd
section), which reads - 'ifup wlan0'
to actually turn on the wifi.
The book offers the following for the content of /etc/network/interfaces.d/eth0
allow-hotplug eth0
iface eth0 inet static
address 10.0.0.1
netmask 255.255.255.0
broadcast 10.0.0.255
gateway 10.0.0.1
and here's what I currently have:
allow-hotplug eth0
iface eth0 inet static
address 10.0.0.1
netmask 255.255.255.0
broadcast 10.0.0.255
The big deviation here is the removal of the gateway
directive, which in my
case prevented the IP forwarding from allowing the leader from contacting external
IPs, and the addition of the pre-up
line which reloads some saved iptables state
(discussed later on).
The removal of the
gateway
directive is suspect to me. It fixes the cluster leader's internet access, but I wonder if the removal is what breaks internet access for the other nodes that route through the leader.
The book suggests installing the dhcpd service with
$ sudo apt-get install isc-dhcp-server
The book mentions being able to restart dhcpd by running
sudo systemctl dhcpd restart
but in this case, the service is namedisc-dhcp-server
rather thandhcpd
.
The book doesn't mention this, but I wanted to be sure that my Pi didn't try to offer leases
on wlan0 which is managed by my home router. In order to restrict dhcpd to specific interfaces,
edit /etc/default/isc-dhcp-server
. Update the INTERFACES
line to list only the private
network interface like so:
INTERFACES="eth0"
My /etc/dhcp/dhcpd.conf
is very similar to the book's suggestion, with a
couple small tweaks.
option domain-name cluster;
option domain-name-servers 8.8.8.8, 8.8.4.4;
subnet 10.0.0.0 netmask 255.255.255.0 {
range 10.0.0.100 10.0.0.254;
option subnet-mask 255.255.255.0;
option broadcast-address 10.0.0.255;
option routers 10.0.0.1;
# host node-02 {
# hardware ethernet XX:XX:XX:XX:XX:XX;
# fixed-address 10.0.0.2;
# }
# host node-03 {
# hardware ethernet XX:XX:XX:XX:XX:XX;
# fixed-address 10.0.0.3;
# }
# host node-04 {
# hardware ethernet XX:XX:XX:XX:XX:XX;
# fixed-address 10.0.0.4;
# }
}
default-lease-time 600;
max-lease-time 7200;
authoritative;
My tweaks were mainly just that I wanted to set the range to not include 10.0.0.1 which is statically assigned to the cluster leader.
I set the dynamic range to be well above the node IPs so I'll have an address to ssh in to so I can learn the MAC address of the wired network interface. Once I learn the MAC address, I can uncomment and update the static lease for the node in question.
As an aside, the book's recommendation to use
/etc/hosts
to give names to the nodes seems undercut by the lack of advice in setting static leases. Without static leases, the nodes could change IPs after being powered down and back up.
The book also suggests option domain-name "cluster.home";
but I shortened this to simply
option domain-name cluster;
so that I might reach my nodes by node-3.cluster
, though I
was never actually able to do this. I could only reach the nodes by IP, and in fact, the book
recommends adding names for the nodes in /etc/hosts
so I'm not sure why it'd make sense to
set this in the dhcp config at all.
While configuring the names in /etc/hosts
is a little brute force, I've been finding it the most
reliable. I don't plan on running Bind on the cluster leader, but until that time I'll be using
static leases and /etc/hosts
.
The first thing to do is to edit /etc/sysctl.conf
and uncomment, or add the following line:
net.ipv4.ip_forward=1
You will need to either reboot, or run sudo sysctl -p
to have this config change reflected by
your network stack.
The book then shows some iptables commands, and suggests running them as part of /etc/rc.local
.
Here's mine:
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT
exit 0
Running sudo iptables -L -n
will show the rules have loaded after a reboot.
The iptables rules shown in the book allowed the cluster leader to reach external IPs, but the other nodes could not access the internet even while routing through the leader.
After much searching, I found this article on the Ubuntu Help site, and with some slight modifications to the interface names, here's a set of rules that work for me:
# The following rule needs to be set up on ALL NODES
iptables -P FORWARD ACCEPT
# The rest should only be configured on the cluster leader
iptables -A FORWARD -o wlan0 -i eth0 -s 10.0.0.0/24 -m conntrack --ctstate NEW -j ACCEPT
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -t nat -F POSTROUTING
iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
I do not understand any of this iptables stuff. I don't know why the book's suggestion doesn't work, or why this set of rules from the ubuntu help site does. Your mileage may vary, apparently.
Take sepecial note of the section immediately above this regarding iptables
. The following rule should be added to the /etc/rc.local
on all nodes in the cluster, including the leader.
iptables -P FORWARD ACCEPT
This is noted in the flannel troubleshooting guide, which links to https://docs.docker.com/engine/userguide/networking/default_network/container-communication/#container-communication-between-hosts
Without this rule, your pods will not be able to communicate with each other.
To start, I ran ssh-keygen
on just the cluster leader (node-01 in my case), as the user
that comes pre-configured with the hypriot image (pirate
).
I then run the following so that a user with this public key can access this host without a password:
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
After this, I also create a file at ~/.ssh/config
which contains:
Host node-*
User pirate
Once you've got all this in place, you can copy the .ssh directory around to all your nodes using scp
.
Unfortunately the nice short
node-*
names won't work just yet, so for this we need to use IPs instead.
$ scp -r ~/.ssh pirate@10.0.0.2:~/
$ scp -r ~/.ssh pirate@10.0.0.3:~/
$ scp -r ~/.ssh pirate@10.0.0.4:~/
You might want to make similar changes to
~/.ssh/config
on your personal computer to avoid having to specify thepirate
user. I do this, and also add my own~/.ssh/id_rsa.pub
to the authorized_keys file on the cluster leader to make accessing the cluster more convenient.
The hypriot image uses cloud-init to manage the hosts file on each node, by default.
Since this is the case, I ssh
into each node by IP, and add the following to the
bottom of /etc/cloud/templates/hosts.debian.tmpl
:
10.0.0.1 node-01
10.0.0.2 node-02
10.0.0.3 node-03
10.0.0.4 node-04
If you haven't already, you may also want to enter the correct hostname for in
/boot/user-data
on each node as you make the rounds.
After rebooting each node, you should be able to see these entries now in /etc/hosts
.
Additionally, if you completed the ssh
config steps mentioned above, you should now be
able to ssh freely between nodes by name.
The book offers some commands which you may find convenient to wrap up in a small script (so it can be scp'd around all your nodes).
Here's my install-k8s.sh
#!/bin/sh -e
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" \
>> /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get upgrade -y
apt-get install -y kubelet kubeadm kubectl kubernetes-cni
Copy this script to each of the nodes, and invoke with sudo
.
On the cluster leader, run the following (which differs slightly from the commands in the book, I guess some flags have changed).
sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address 10.0.0.1
The end of the command output should share a command to run on each of the other nodes to join them to your cluster.
For downloading and modifying the flannel and the dashboard yamls, I wrote some scripts (just so I'd have a record of what I ran). Again, there are some slight tweaks to update the urls given the drift of time.
#!/bin/sh -e
# ./get-flannel.sh > flannel.yaml
curl https://rawgit.com/coreos/flannel/master/Documentation/kube-flannel.yml \
| sed "s/amd64/arm/g" | sed "s/vxlan/host-gw/g"
#!/bin/sh -e
# ./get-dashboard.sh > kubernetes-dashboard.sh
CONF_URL=https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
curl -sSL \
$CONF_URL \
| sed "s/amd64/arm/g"