When running nested GATS, we occasionally get containers without network connectivity.
It's unclear whether this is new flakiness. It's unclear whether this affects BOSH deployments.
The tests that seem to hit this the most are those in networking_test.go
around resolving DNS, but DNS resolution is failing
due to a lack of connectivity (unable to ping 8.8.8.8
) rather than a problem with resolution.
I was able to consistently reproduce by running GATS and putting a ping -c 3 8.8.8.8 || sleep 10000
in a JustBeforeEach in networking_test.go
.
Seems possible to reproduce with just a single container creation (through GATS) that is then unable to ping 8.8.8.8
.
It's strange that even though all tests in this file are performing this ping, there still seems to be a correlation with the DNS tests failing the most. This might just be a bias in what I'm seeing, or it might be that they often coincide with some other event that makes occurrence of this flake more likely (suites are randomized but specs are not).
These tests are run in containers in concourse so there is some nesting going on here.
Interfaces on host
806: whc6n3t49uas-1@if807:
is the container side veth for the concourse test container (-1
prefix compared to -0
and no bridge indicates this).
4: wheb9g66aaof-0@if3
is the host side veth for the container we are testing network connectivity for.
2: wbrdg-0afe0000
is the bridge for the veth for the container we are testing network connectivity for.
root@267bc95a-571c-4178-78cd-399da1e5765a:/tmp/build/e55deab7# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: wbrdg-0afe0000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue state UP mode DEFAULT group default
link/ether 26:e2:4e:b0:5f:b9 brd ff:ff:ff:ff:ff:ff
4: wheb9g66aaof-0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue master wbrdg-0afe0000 state UP mode DEFAULT group default qlen 1
link/ether 9a:ff:39:a9:66:6a brd ff:ff:ff:ff:ff:ff
806: whc6n3t49uas-1@if807: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue state UP mode DEFAULT group default qlen 1
link/ether 76:e5:87:16:ad:fa brd ff:ff:ff:ff:ff:ff
Interfaces in container
root@267bc95a-571c-4178-78cd-399da1e5765a:/tmp/build/e55deab7# /tmp/build/e55deab7/gr-release-develop/bin/runc exec 29a7fe24-7062-47c2-465f-b43d9515793e /bin/ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: wheb9g66aaof-1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1432 qdisc noqueue qlen 1
link/ether ae:44:d5:ae:51:8f brd ff:ff:ff:ff:ff:ff
Pinging 8.8.8.8 from within container
root@267bc95a-571c-4178-78cd-399da1e5765a:/tmp/build/e55deab7# /tmp/build/e55deab7/gr-release-develop/bin/runc exec 29a7fe24-7062-47c2-465f-b43d9515793e /bin/ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=51 time=0.421 ms
64 bytes from 8.8.8.8: seq=1 ttl=51 time=0.417 ms
64 bytes from 8.8.8.8: seq=2 ttl=51 time=0.394 ms
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.394/0.410/0.421 ms
tcpdump of host side veth during ping from within container
root@267bc95a-571c-4178-78cd-399da1e5765a:/tmp/build/e55deab7# tcpdump -i wheb9g66aaof-0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wheb9g66aaof-0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:01:47.138833 ARP, Request who-has 10.254.0.1 tell 10.254.0.2, length 28
09:01:47.138864 ARP, Reply 10.254.0.1 is-at 26:e2:4e:b0:5f:b9 (oui Unknown), length 28
09:01:47.138869 IP 10.254.0.2 > google-public-dns-a.google.com: ICMP echo request, id 11520, seq 0, length 64
09:01:47.139617 IP google-public-dns-a.google.com > 10.254.0.2: ICMP echo reply, id 11520, seq 0, length 64
09:01:48.139034 IP 10.254.0.2 > google-public-dns-a.google.com: ICMP echo request, id 11520, seq 1, length 64
09:01:48.139420 IP google-public-dns-a.google.com > 10.254.0.2: ICMP echo reply, id 11520, seq 1, length 64
09:01:49.139264 IP 10.254.0.2 > google-public-dns-a.google.com: ICMP echo request, id 11520, seq 2, length 64
09:01:49.139607 IP google-public-dns-a.google.com > 10.254.0.2: ICMP echo reply, id 11520, seq 2, length 64
09:01:52.155571 ARP, Request who-has 10.254.0.2 tell 10.254.0.1, length 28
09:01:52.155633 ARP, Reply 10.254.0.2 is-at ae:44:d5:ae:51:8f (oui Unknown), length 28
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel
Interfaces on host
root@8047e7ad-d4c2-460b-7c16-7d2187595852:/tmp/build/e55deab7# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
851: whc6n3t49ubb-1@if852: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue state UP mode DEFAULT group default qlen 1
link/ether d2:03:75:dc:79:b4 brd ff:ff:ff:ff:ff:ff
410: wbrdg-0afe0000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue state UP mode DEFAULT group default
link/ether 62:a4:06:cc:02:21 brd ff:ff:ff:ff:ff:ff
412: whecjqqmjdn1-0@if411: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue master wbrdg-0afe0000 state UP mode DEFAULT group default qlen 1
link/ether 3a:d5:0c:df:1a:b1 brd ff:ff:ff:ff:ff:ff
Interfaces in container
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
411: whecjqqmjdn1-1@if412: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1432 qdisc noqueue state UP mode DEFAULT group default qlen 1
link/ether b6:bf:da:a4:02:07 brd ff:ff:ff:ff:ff:ff
Pinging 8.8.8.8 from within container
root@8047e7ad-d4c2-460b-7c16-7d2187595852:/tmp/build/e55deab7# /tmp/build/e55deab7/gr-release-develop/bin/runc exec 591924f7-9211-4032-5f7a-65ecd893e346 /bin/ping -c 3 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
92 bytes from 591924f7-9211-4032-5f7a-65ecd893e346 (10.254.0.2): Destination Host Unreachable
92 bytes from 591924f7-9211-4032-5f7a-65ecd893e346 (10.254.0.2): Destination Host Unreachable
92 bytes from 591924f7-9211-4032-5f7a-65ecd893e346 (10.254.0.2): Destination Host Unreachable
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
tcpdump of host side veth during ping from within container
This was kind of interesting. Clearly there are no IP packets coming through but also interesting is that
this command hung for a while even after I issued a Ctrl+C
to try and kill it.
root@8047e7ad-d4c2-460b-7c16-7d2187595852:/tmp/build/e55deab7# tcpdump -i whecjqqmjdn1-0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on whecjqqmjdn1-0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C^C09:35:04.288318 ARP, Request who-has 10.254.0.1 tell 8047e7ad-d4c2-460b-7c16-7d2187595852, length 28
1 packet captured
3 packets received by filter
0 packets dropped by kernel
second tcpdump of host side veth during a second ping from within container
These ARP packets getting captured were immediate in their response in subsequent pings, like the first ARP lookup
was slow. Still no IP packets. 8047e7ad-d4c2-460b-7c16-7d2187595852
is the hostname of the host.
root@8047e7ad-d4c2-460b-7c16-7d2187595852:/tmp/build/e55deab7# tcpdump -i whecjqqmjdn1-0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on whecjqqmjdn1-0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:35:50.005350 ARP, Request who-has 10.254.0.1 tell 8047e7ad-d4c2-460b-7c16-7d2187595852, length 28
09:35:51.003541 ARP, Request who-has 10.254.0.1 tell 8047e7ad-d4c2-460b-7c16-7d2187595852, length 28
09:35:52.003537 ARP, Request who-has 10.254.0.1 tell 8047e7ad-d4c2-460b-7c16-7d2187595852, length 28
^C
3 packets captured
3 packets received by filter
0 packets dropped by kernel
Other Random Checks
OS Info
This container is Debian but we've seen this in BusyBox too.
root@8047e7ad-d4c2-460b-7c16-7d2187595852:/tmp/build/e55deab7# /tmp/build/e55deab7/gr-release-develop/bin/runc exec 591924f7-9211-4032-5f7a-65ecd893e346 /bin/cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Pinging the IP address returned in ping
I think this is essentially localhost but I don't know ping very well.
root@8047e7ad-d4c2-460b-7c16-7d2187595852:/tmp/build/e55deab7# /tmp/build/e55deab7/gr-release-develop/bin/runc exec 591924f7-9211-4032-5f7a-65ecd893e346 /bin/ping -c 3 10.254.0.2
PING 10.254.0.2 (10.254.0.2): 56 data bytes
64 bytes from 10.254.0.2: icmp_seq=0 ttl=64 time=0.056 ms
64 bytes from 10.254.0.2: icmp_seq=1 ttl=64 time=0.058 ms
64 bytes from 10.254.0.2: icmp_seq=2 ttl=64 time=0.057 ms
--- 10.254.0.2 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.056/0.057/0.058/0.000 ms
Pinging own hostname
root@8047e7ad-d4c2-460b-7c16-7d2187595852:/tmp/build/e55deab7# /tmp/build/e55deab7/gr-release-develop/bin/runc exec 591924f7-9211-4032-5f7a-65ecd893e346 /bin/ping -c 3 591924f7-9211-4032-5f7a-65ecd893e346
PING 591924f7-9211-4032-5f7a-65ecd893e346 (10.254.0.2): 56 data bytes
64 bytes from 10.254.0.2: icmp_seq=0 ttl=64 time=0.054 ms
64 bytes from 10.254.0.2: icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from 10.254.0.2: icmp_seq=2 ttl=64 time=0.064 ms
--- 591924f7-9211-4032-5f7a-65ecd893e346 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.049/0.056/0.064/0.000 ms
There doesn't seem to be anything unusual about the links when comparing healthy to unhealthy containers.
It seems ARP packets are getting from contaienr veth to host veth but IP are not.
Can this be reproduced in Ubuntu where I have iptables?
Can this be reproduced in GATS against BOSH deployment?