The URL standard is one which builds on IP standards. Operating systems also build on IP standards. Believe it or not, the two diverge in some surprising ways!
Hyperlink aims to do some validation of IPs contained within URLs with IP hosts. URLs support a subset of IP syntax, as well as making affordances for future versions of IPs beyond IPv6 (no matter how unfathomable that may seem).
inet_aton()
can be used to parse IPv4, but it accepts more than just IPs. It accepts '0', which is interpreted as '0.0.0.0', which no browser or URL standard supports. See theinet(3)
man
page for the full set of acceptable inputs. URLs are only meant to have IPv4 IPs represented as dotted quads (four groups of integers between 0 and 255, separated by dots).inet_pton(AF_INET, x)
is a more useful and strict way to parse IPv4 IPs, but inet_pton is not available on all unixes in Python 2, nor is it available on Windows. This can be (and has historically been) fixed by using ctypes on Windows py27. (No fix for other operating systems, as no one seems to know what else doesn't support this).inet_pton(AF_INET6, x)
does not support IPv6 zone identifiers, used for link-local connections, despite support in other socket functions, such ascreate_connection
.inet_pton(AF_INET6, x)
does not always agree between Windows and Linux. For instance,::2222:3333:4444:5555:6666:7777:8888
is a valid IPv6 IP according to the standard's grammar, as well as inet_pton on Linux, but Windows disagrees andsocket.inet_pton()
raises a socket.error. (Note that::3333:4444:5555:6666:7777:8888
is OK on both. There are a few hundred test addresses in hyperlink's test package at the moment.)
Python's socket
module thinly wraps the operating system's socket
facilities, and as you can see above, operating systems don't have a
consensus.
The target balance for us is to support the standard first, but reach
practical compromises that ensure URL.host
will be a valid argument
to pass to socket
-module-based connection managers and other
networking libraries on all major operating systems.