Skip to content

Instantly share code, notes, and snippets.

@jakubgs
Created June 22, 2018 09:52
Show Gist options
  • Save jakubgs/c2ab59bea25c54401ae4068280db1154 to your computer and use it in GitHub Desktop.
Save jakubgs/c2ab59bea25c54401ae4068280db1154 to your computer and use it in GitHub Desktop.
Research into Digital Ocean UDP delivery issues when using Floating IPs.

Info

We've discovered issues with UDP traffic going to our bootnodes via the Digital Ocean Floating IPs. According to DO "Floating IP" is:

Floating IP is an IP address that can be instantly moved from one Droplet to another Droplet in the same datacenter.

Terms

  • Anchor IP - Default internal IP given to a dropplet in 10.0.0.0/8 subnets.
  • Dropplet IP - Default public IP given to a dropplet on creation.
  • Floating IP - Movable public IP.

Initial Tests

Initially we've identified an issue where statusd peers would throw following errors when trying to connect to DO bootnodes:

<-net.timeout 
msg="--- (3) pongTimeout for 436cc6f674928fdc@174.138.105.243:30404: verifyinit -> unknown (ok)"
<-net.timeout 
msg="--- (2) pongTimeout for 5395aab7833f1ecb@206.189.243.57:30404: verifyinit -> unknown (ok)"

Checking with tcpdump indicated that the bootnodes receive the UDP traffic and send back a response:

15:06:51.472000 IP 109.86.198.208.30303 > 10.18.0.37.30404: UDP, length 120
15:06:51.473628 IP 174.138.7.182.30404 > 109.86.198.208.30303: UDP, length 174
15:06:51.473934 IP 174.138.7.182.30404 > 109.86.198.208.30303: UDP, length 132

Despite that the issue was still present.

Followup

After doing some tests with netcat an issue was identified where some packets would not arrive at the destination when the destination was the Floating IP. When testing with two hosts using nc utility a listening instance would be started on one:

nc -l -u -p 1234

And a sending instance on another:

nc -u ${DROPPLET_IP} 1234

And messages would be send interactively. Three behaviours were identified:

  • When addressing to the Dropplet IP all packets would be received and sent back and forth fine.
  • When addressing to the Floating IP only the first packet would be received.
  • When addressing to the Floating IP responding back to the UDP "connection" did not work.

This behaviour can be viewed here: https://asciinema.org/a/Aw9ucl29OtVxnm6gtSrnpMJBN

NOTE: This behaviour was tested in different regions and using different OSs.

Deep Dive

Packets

The next step was to identify the difference. The packets were examined using tcpdump.


Dropplet IP Target

09:16:05.240305 f4:a7:39:d7:8a:7d > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.35520 > 188.166.68.46.search-agent: UDP, length 2
09:16:05.791998 f4:a7:39:d7:8a:7d > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.35520 > 188.166.68.46.search-agent: UDP, length 2

Floating IP Target

09:16:09.904829 00:00:5e:00:01:6e > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.53014 > 10.18.0.26.search-agent: UDP, length 2
09:16:10.418658 00:00:5e:00:01:6e > 8a:05:1e:bd:cb:01, ethertype IPv4 (0x0800), length 60: 167.99.219.77.53014 > 10.18.0.26.search-agent: UDP, length 2

The only three differences(apart from checksums) appear to be:

  • The destination IP
    • Dropplet IP used when sending to Dropplet IP
    • Anchor IP used when sending to Floating IP
  • No GeoIP info for destination
    • Anchor IP has no GeoIP info in the packet.
  • Source Device MAC Address
    • When sending to Dropplet IP packet comes from a Juniper device (JuniperN_d7:82:7d)
    • When sending to Floating IP packet comes from (probably) HP device (IETF-VRRP-VRID_6e)

Socat

Using a different program from netcat - socat - a working "solution" was identified. Using socat with the fork option appears to at least let us receive packets(although not respond):

 # socat -v - udp4-listen:1234
< 2018/06/22 09:05:36.252411  length=5 from=0 to=4          # delivered
test
test

Without fork only first packet would be received when addressed to the Floating IP.

 # socat -v - udp4-listen:1234,fork
< 2018/06/22 09:05:36.252411  length=5 from=0 to=4          # delivered
test
test
< 2018/06/22 09:05:36.866158  length=5 from=5 to=9          # delivered
test
test
resp
> 2018/06/22 09:41:41.190379  length=5 from=0 to=4          # NOT delivered
resp

With fork we would receive the following packets after the first, but we could not respond.

With a higher verbosity the difference was identified. The fork option would have no effect when connecting via Dropplet IP:

[root@udp-test-01 ~]# socat -d -d -d - udp4-listen:1234,fork
...
2018/06/22 09:44:25 socat[10714] I setting option "fork" to 1
2018/06/22 09:44:25 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:44:25 socat[10714] N listening on UDP AF=2 0.0.0.0:1234

2018/06/22 09:44:28 socat[10714] N accepting UDP connection from AF=2 167.99.219.77:54879
2018/06/22 09:44:28 socat[10714] I permitting UDP connection from AF=2 167.99.219.77:54879
2018/06/22 09:44:28 socat[10714] N forked off child process 10715
2018/06/22 09:44:28 socat[10714] I close(5)
2018/06/22 09:44:28 socat[10714] I still listening
2018/06/22 09:44:28 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:44:28 socat[10714] N listening on UDP AF=2 0.0.0.0:1234
2018/06/22 09:44:28 socat[10715] I just born: child process 10715
2018/06/22 09:44:28 socat[10715] I resolved and opened all sock addresses
2018/06/22 09:44:28 socat[10715] N starting data transfer loop with FDs [0,1] and [5,5]
test
2018/06/22 09:44:28 socat[10715] I transferred 5 bytes from 5 to 1
test
2018/06/22 09:44:30 socat[10715] I transferred 5 bytes from 5 to 1
test
2018/06/22 09:44:30 socat[10715] I transferred 5 bytes from 5 to 1
resp
2018/06/22 09:44:33 socat[10715] I transferred 5 bytes from 0 to 5
resp

But when packets were sent via the Floating IP socat would fork a new child process for every packet received:

2018/06/22 09:44:34 socat[10715] I transferred 5 bytes from 0 to 5
2018/06/22 09:46:52 socat[10714] N accepting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] I permitting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] N forked off child process 10730
2018/06/22 09:46:52 socat[10714] I close(5)
2018/06/22 09:46:52 socat[10714] I still listening
2018/06/22 09:46:52 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:46:52 socat[10714] N listening on UDP AF=2 0.0.0.0:1234
2018/06/22 09:46:52 socat[10730] I just born: child process 10730
2018/06/22 09:46:52 socat[10730] I resolved and opened all sock addresses
2018/06/22 09:46:52 socat[10730] N starting data transfer loop with FDs [0,1] and [5,5]
test
2018/06/22 09:46:52 socat[10730] I transferred 5 bytes from 5 to 1
2018/06/22 09:46:52 socat[10714] N accepting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] I permitting UDP connection from AF=2 167.99.219.77:43743
2018/06/22 09:46:52 socat[10714] N forked off child process 10731
2018/06/22 09:46:52 socat[10714] I close(5)
2018/06/22 09:46:52 socat[10714] I still listening
2018/06/22 09:46:52 socat[10714] I socket(2, 2, 17) -> 5
2018/06/22 09:46:52 socat[10714] N listening on UDP AF=2 0.0.0.0:1234
2018/06/22 09:46:52 socat[10731] I just born: child process 10731
2018/06/22 09:46:52 socat[10731] I resolved and opened all sock addresses
2018/06/22 09:46:52 socat[10731] N starting data transfer loop with FDs [0,1] and [5,5]
test
2018/06/22 09:46:52 socat[10731] I transferred 5 bytes from 5 to 1

The cause for this difference can be seen in the code here: xio-listen.c#L279 This does allow for receiving all packets but does not solve the issue of not being able to respond to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment