bprashanth/packet_laundering.md

## packet_laundering.md

      
    Raw
  

              packet_laundering.md
            
          
    First create a working service:
apiVersion: v1
kind: Service
metadata:
  name: echoheaders
  labels:
    app: echoheaders
spec:
  # type: NodePort
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: echoheaders
---
apiVersion: v1
kind: ReplicationController
metadata:
  name: echoheaders
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: echoheaders
    spec:
      containers:
      - name: echoheaders
        image: gcr.io/google_containers/echoserver:1.4
        ports:
        - containerPort: 8080
This service has 1 pod, assume pod ip 10.184.2.4/24 and service vip is 10.187.253.130. You should be able to kubectl exec into 10.184.2.4/24 and curl 10.187.253.130:8080. To break this setup, turn off hairpin, make sure your bridge is not in promiscuous mode (netstat -i will show a P near cbr0 if it is, to turn it off ip link set cbr0 promisc off, you might have to modify kubelet settings so it doesn't re-enalbe by adding hairpin-mode=none to the defaults in /etc/defaults/kubelet and doing a sudo service kubelet restart so it picks it up).
for f in /sys/devices/virtual/net/*/brport/hairpin_mode; do echo 0 > $f; done
At this point you have a broken service.
To fix it, first create the packet-laundering netns.
$ ip netns add k8s_hairpin_workaround
Setup a veth pair with a random ip
$ R=$RANDOM
$ ip link add k8s_reflector type veth peer name k8s_veth$R
$ ip addr add dev k8s_reflector 169.254.169.169/30
$ ip link set dev k8s_reflector up
shove one end of it in the hairpin netns
$ ip link set k8s_veth$R netns k8s_hairpin_workaround
$ ip netns exec k8s_hairpin_workaround /bin/sh
# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
23: k8s_veth3138: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 12:35:f0:b1:5f:1e brd ff:ff:ff:ff:ff:ff
    
# ip link set dev k8s_veth3138 name eth0
# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
23: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 12:35:f0:b1:5f:1e brd ff:ff:ff:ff:ff:ff
    
# ip addr add dev eth0 169.254.169.170/30
# ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
23: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 12:35:f0:b1:5f:1e brd ff:ff:ff:ff:ff:ff
    inet 169.254.169.170/30 scope global eth0
       valid_lft forever preferred_lft forever
       
# curl 216.58.195.238
curl: (7) Failed to connect to 216.58.195.238: Network is unreachable

# ip route list            
169.254.169.168/30 dev eth0  proto kernel  scope link  src 169.254.169.170 

# ip route add default via 169.254.169.169
# curl 216.58.195.238
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
Setup iptables rules in the netns to reflect back to source. In the example, 10.184.2.4 is a pod on the node behind a service. Curling the service vip from within the pod will lead to the hairpin situation, because it's the only pod in that service.
# iptables -t nat -A PREROUTING -s 10.184.2.4 -j DNAT --to-destination=10.184.2.4
# tcpdump -i any
... wait
Now route packets from the hairpin pod on the host, into the laundering netns (so it shows up in tcpdump).
$ sudo iptables -A PREROUTING -s 10.184.2.4/32 -p tcp -m tcp -j DNAT --to-destination 169.254.169.170

$ sudo iptables-save | grep PRE
:PREROUTING ACCEPT [32:1720]
:PREROUTING ACCEPT [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A PREROUTING -s 10.184.2.4/32 -p tcp -m tcp -j DNAT --to-destination 169.254.169.170
beeps@gke-cluster-1-default-pool-4e4b87ff-jyxe:~$ sudo iptables-save | grep -i kube-ser
:KUBE-SERVICES - [0:0]
-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
:KUBE-SERVICES - [0:0]
-A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
...

$ sudo iptables -t nat -D PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES;  sudo iptables -t nat -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
Now kubectl exec into the pod:
$ kubectl get svc 
NAME          CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
echoheaders   10.187.253.130   <none>        80/TCP    1d
kubernetes    10.187.240.1     <none>        443/TCP   4d
$ kubectl get po
kNAME                READY     STATUS    RESTARTS   AGE
busybox             1/1       Running   40         1d
echoheaders-ygvxu   1/1       Running   0          1d
$ kubectl exec -it echoheaders-ygvxu /bin/bash
bash-4.3# curl 10.187.253.130
curl: (7) Failed to connect to 10.187.253.130 port 80: Connection refused

The netns tcpdump should show packets:
# tcpdump -i any
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
19:12:06.426058 IP 169.254.169.169.34074 > 169.254.169.170.http: Flags [S], seq 1503079823, win 28400, options [mss 1420,sackOK,TS val 106405470 ecr 0,nop,wscale 7], length 0
19:12:06.426081 IP 169.254.169.170.http > 169.254.169.169.34074: Flags [R.], seq 0, ack 1503079824, win 0, length 0
19:12:06.426420 IP 169.254.169.170.49322 > metadata.google.internal.domain: 30563+ PTR? 170.169.254.169.in-addr.arpa. (46)

To route them back out, we need to make sure the netns masquerades
# ip netns exec k8s_hairpin_workaround iptables -t nat -A POSTROUTING -j MASQUERADE
and the host doesn't masquerade because that would otherwise trigger the hairpin mode:
$ iptables -t nat -I POSTROUTING -o k8s_reflector -j ACCEPT 
In the pod:
# bash-4.3 curl service-vip:8080
CLIENT VALUES:
client_address=169.254.169.170
command=GET
real path=/
query=nil
request_version=1.1
request_uri=http://10.187.253.130:8080/

SERVER VALUES:
server_version=nginx: 1.9.11 - lua: 10001

HEADERS RECEIVED:
accept=*/*
host=10.187.253.130:8080
user-agent=curl/7.47.1
BODY: