First create a working service:
apiVersion: v1
kind: Service
metadata:
name: echoheaders
labels:
app: echoheaders
spec:
# type: NodePort
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
app: echoheaders
---
apiVersion: v1
kind: ReplicationController
metadata:
name: echoheaders
spec:
replicas: 1
template:
metadata:
labels:
app: echoheaders
spec:
containers:
- name: echoheaders
image: gcr.io/google_containers/echoserver:1.4
ports:
- containerPort: 8080
This service has 1 pod, assume pod ip 10.184.2.4/24
and service vip is 10.187.253.130
. You should be able to kubectl exec into 10.184.2.4/24 and curl 10.187.253.130:8080. To break this setup, turn off hairpin, make sure your bridge is not in promiscuous mode (netstat -i will show a P near cbr0 if it is, to turn it off ip link set cbr0 promisc off
, you might have to modify kubelet settings so it doesn't re-enalbe by adding hairpin-mode=none
to the defaults in /etc/defaults/kubelet and doing a sudo service kubelet restart
so it picks it up).
for f in /sys/devices/virtual/net/*/brport/hairpin_mode; do echo 0 > $f; done
At this point you have a broken service.
To fix it, first create the packet-laundering netns.
$ ip netns add k8s_hairpin_workaround
Setup a veth pair with a random ip
$ R=$RANDOM
$ ip link add k8s_reflector type veth peer name k8s_veth$R
$ ip addr add dev k8s_reflector 169.254.169.169/30
$ ip link set dev k8s_reflector up
shove one end of it in the hairpin netns
$ ip link set k8s_veth$R netns k8s_hairpin_workaround
$ ip netns exec k8s_hairpin_workaround /bin/sh
# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
23: k8s_veth3138: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 12:35:f0:b1:5f:1e brd ff:ff:ff:ff:ff:ff
# ip link set dev k8s_veth3138 name eth0
# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
23: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 12:35:f0:b1:5f:1e brd ff:ff:ff:ff:ff:ff
# ip addr add dev eth0 169.254.169.170/30
# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
23: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 12:35:f0:b1:5f:1e brd ff:ff:ff:ff:ff:ff
inet 169.254.169.170/30 scope global eth0
valid_lft forever preferred_lft forever
# curl 216.58.195.238
curl: (7) Failed to connect to 216.58.195.238: Network is unreachable
# ip route list
169.254.169.168/30 dev eth0 proto kernel scope link src 169.254.169.170
# ip route add default via 169.254.169.169
# curl 216.58.195.238
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
Setup iptables rules in the netns to reflect back to source. In the example, 10.184.2.4 is a pod on the node behind a service. Curling the service vip from within the pod will lead to the hairpin situation, because it's the only pod in that service.
# iptables -t nat -A PREROUTING -s 10.184.2.4 -j DNAT --to-destination=10.184.2.4
# tcpdump -i any
... wait
Now route packets from the hairpin pod on the host, into the laundering netns (so it shows up in tcpdump).
$ sudo iptables -A PREROUTING -s 10.184.2.4/32 -p tcp -m tcp -j DNAT --to-destination 169.254.169.170
$ sudo iptables-save | grep PRE
:PREROUTING ACCEPT [32:1720]
:PREROUTING ACCEPT [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -s 10.184.2.4/32 -p tcp -m tcp -j DNAT --to-destination 169.254.169.170
beeps@gke-cluster-1-default-pool-4e4b87ff-jyxe:~$ sudo iptables-save | grep -i kube-ser
:KUBE-SERVICES - [0:0]
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
:KUBE-SERVICES - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
...
$ sudo iptables -t nat -D PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES; sudo iptables -t nat -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
Now kubectl exec into the pod:
$ kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
echoheaders 10.187.253.130 <none> 80/TCP 1d
kubernetes 10.187.240.1 <none> 443/TCP 4d
$ kubectl get po
kNAME READY STATUS RESTARTS AGE
busybox 1/1 Running 40 1d
echoheaders-ygvxu 1/1 Running 0 1d
$ kubectl exec -it echoheaders-ygvxu /bin/bash
bash-4.3# curl 10.187.253.130
curl: (7) Failed to connect to 10.187.253.130 port 80: Connection refused
The netns tcpdump should show packets:
# tcpdump -i any
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
19:12:06.426058 IP 169.254.169.169.34074 > 169.254.169.170.http: Flags [S], seq 1503079823, win 28400, options [mss 1420,sackOK,TS val 106405470 ecr 0,nop,wscale 7], length 0
19:12:06.426081 IP 169.254.169.170.http > 169.254.169.169.34074: Flags [R.], seq 0, ack 1503079824, win 0, length 0
19:12:06.426420 IP 169.254.169.170.49322 > metadata.google.internal.domain: 30563+ PTR? 170.169.254.169.in-addr.arpa. (46)
To route them back out, we need to make sure the netns masquerades
# ip netns exec k8s_hairpin_workaround iptables -t nat -A POSTROUTING -j MASQUERADE
and the host doesn't masquerade because that would otherwise trigger the hairpin mode:
$ iptables -t nat -I POSTROUTING -o k8s_reflector -j ACCEPT
In the pod:
# bash-4.3 curl service-vip:8080
CLIENT VALUES:
client_address=169.254.169.170
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://10.187.253.130:8080/
SERVER VALUES:
server_version=nginx: 1.9.11 - lua: 10001
HEADERS RECEIVED:
accept=*/*
host=10.187.253.130:8080
user-agent=curl/7.47.1
BODY: