Skip to content

Instantly share code, notes, and snippets.

@mahmoud
Last active January 15, 2018 02:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mahmoud/55a4ff927a34e3b1425f1ae51f044aae to your computer and use it in GitHub Desktop.
Save mahmoud/55a4ff927a34e3b1425f1ae51f044aae to your computer and use it in GitHub Desktop.

URLs and IPs

The URL standard is one which builds on IP standards. Operating systems also build on IP standards. Believe it or not, the two diverge in some surprising ways!

Hyperlink aims to do some validation of IPs contained within URLs with IP hosts. URLs support a subset of IP syntax, as well as making affordances for future versions of IPs beyond IPv6 (no matter how unfathomable that may seem).

  • inet_aton() can be used to parse IPv4, but it accepts more than just IPs. It accepts '0', which is interpreted as '0.0.0.0', which no browser or URL standard supports. See the inet(3) man page for the full set of acceptable inputs. URLs are only meant to have IPv4 IPs represented as dotted quads (four groups of integers between 0 and 255, separated by dots).
  • inet_pton(AF_INET, x) is a more useful and strict way to parse IPv4 IPs, but inet_pton is not available on all unixes in Python 2, nor is it available on Windows. This can be (and has historically been) fixed by using ctypes on Windows py27. (No fix for other operating systems, as no one seems to know what else doesn't support this).
  • inet_pton(AF_INET6, x) does not support IPv6 zone identifiers, used for link-local connections, despite support in other socket functions, such as create_connection.
  • inet_pton(AF_INET6, x) does not always agree between Windows and Linux. For instance, ::2222:3333:4444:5555:6666:7777:8888 is a valid IPv6 IP according to the standard's grammar, as well as inet_pton on Linux, but Windows disagrees and socket.inet_pton() raises a socket.error. (Note that ::3333:4444:5555:6666:7777:8888 is OK on both. There are a few hundred test addresses in hyperlink's test package at the moment.)

Python's socket module thinly wraps the operating system's socket facilities, and as you can see above, operating systems don't have a consensus.

The target balance for us is to support the standard first, but reach practical compromises that ensure URL.host will be a valid argument to pass to socket-module-based connection managers and other networking libraries on all major operating systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment