Skip to content

Instantly share code, notes, and snippets.

@floer32
Last active December 24, 2019 15:30
Show Gist options
  • Save floer32/8aaab82e6ba23806f8dc to your computer and use it in GitHub Desktop.
Save floer32/8aaab82e6ba23806f8dc to your computer and use it in GitHub Desktop.
[Regarding Python 2 - in Python 3 just use normal strings that are always Unicode.] // quick example of encoding and decoding a international domain name in Python (from Unicode to Punycode or IDNA codecs and back). Pay attention to the Unicode versus byte strings
# INCORRECT! DON'T DO THIS!
>>> x = "www.alliancefrançaise.nu" # This is the problematic line. Forgot to make this a Unicode string.
>>> print x
www.alliancefrançaise.nu
>>> x.encode('punycode')
'www.Alliancefranaise.nu-h1a31e'
>>> x.encode('punycode').decode('punycode')
u'www.Alliancefran\xc3\xa7aise.nu'
>>> print x.encode('punycode').decode('punycode')
www.alliancefrançaise.nu
>>> print x
www.alliancefrançaise.nu
>>> x == x.encode('punycode').decode('punycode')
/usr/bin/ipython:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
#!/usr/bin/env python
False
# CORRECT FOR PUNYCODE (ALMOST THE BEST):
>>> x = u"www.Alliancefrançaise.nu" # The difference! The Unicode string (decoded) string must be Unicode type
>>> print x
www.alliancefrançaise.nu
>>> x.encode('punycode')
'www.alliancefranaise.nu-dbc'
>>> x.encode('punycode').decode('punycode')
u'www.alliancefran\xe7aise.nu'
>>> print x.encode('punycode').decode('punycode')
www.alliancefrançaise.nu
>>> x == x.encode('punycode').decode('punycode')
True
# BEST ('idna' is preferable to 'punycode', see http://en.wikipedia.org/wiki/Punycode and https://docs.python.org/2/library/codecs.html#module-encodings.idna ) :
>>> x = u"www.alliancefrançaise.nu"
>>> print x
www.alliancefrançaise.nu
>>> x.encode('idna')
www.xn--alliancefranaise-npb.nu
>>> x.encode('idna').decode('idna')
u'www.alliancefran\xe7aise.nu'
>>> print x.encode('idna').decode('idna')
www.alliancefrançaise.nu
>>> x == x.encode('idna').decode('idna')
True
@cyberkulebyaka
Copy link

cyberkulebyaka commented Dec 24, 2019

Section two not updated by comments @tuck1s and @icamys

# CORRECT FOR PUNYCODE (ALMOST THE BEST):
>>> x = u"www.Alliancefrançaise.nu"  # The difference! The Unicode string (decoded) string must be Unicode type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment