idna package notes:
- If a segment of a host (i.e., something in
url.host.split('.')
) is
already ascii, idna doesn't perform its usual checks. For instance,
capital letters are not valid idna2008. The package automatically lowercases.
You'll get something like:
idna.core.InvalidCodepoint: Codepoint U+004B at position 1 ... not allowed
This check and some other functionality can be bypassed by passing
uts46=True to encode/decode. This allows a more permission and
convenient interface. So far it seems like the balanced approach.
However, all of this is bypassed if the string segment contains no
unicode characters.
Example output:
>>> idna.encode(u'mahmöud.io')
'xn--mahmud-zxa.io'
>>> idna.encode(u'Mahmöud.io')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mahmoud/virtualenvs/hyperlink/local/lib/python2.7/site-packages/idna/core.py", line 355, in encode
result.append(alabel(label))
File "/home/mahmoud/virtualenvs/hyperlink/local/lib/python2.7/site-packages/idna/core.py", line 276, in alabel
check_label(label)
File "/home/mahmoud/virtualenvs/hyperlink/local/lib/python2.7/site-packages/idna/core.py", line 253, in check_label
raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+004D at position 1 of u'Mahm\xf6ud' not allowed
>>> idna.encode(u'Mahmoud.io')
'Mahmoud.io'