Skip to content

Instantly share code, notes, and snippets.

@flying-sheep
Last active December 17, 2015 00:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save flying-sheep/5520997 to your computer and use it in GitHub Desktop.
Save flying-sheep/5520997 to your computer and use it in GitHub Desktop.
Test for the i18n bug discovered on the KDE
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from PyKDE4.kdecore import i18n
tests = [
(' ascii string', 'x'),
(' ascii bytes ', b'x'),
('unicode string', '…'),
('unicode bytes ', '…'.encode('utf-8')),
]
for name, test in tests:
try:
print('{} success: {!r} got translated to {!r}'.format(name, test, i18n(test)))
except Exception as e:
print('{} fail: {!r} was not translated due to {!r}'.format(name, test, e))
ascii string success: u'x' got translated to PyQt4.QtCore.QString(u'x')
ascii bytes success: 'x' got translated to PyQt4.QtCore.QString(u'x')
unicode string fail: u'\u2026' was not translated due to TypeError("i18n(): argument 1 has unexpected type 'unicode'",)
unicode bytes success: '\xe2\x80\xa6' got translated to PyQt4.QtCore.QString(u'\u2026')
note that in python 2, unicode literals get represented as u'', while bytes literals get represented as ''
ascii string success: 'x' got translated to 'x'
ascii bytes success: b'x' got translated to 'x'
unicode string fail: '…' was not translated due to UnicodeEncodeError('ascii', '…', 0, 1, 'ordinal not in range(128)')
unicode bytes success: b'\xe2\x80\xa6' got translated to '…'
note that in python 3, unicode literals get represented as '', while bytes literals get represented as b''

As you can see, i18n seems to try to decode passed unicode strings with the ascii codec, while encoding the returned tring with utf-8. In python 2, apparently if the trying fails, the unicode object gets passed to something expecting bytes, while in python 3, the failing to encode with ascii already produces an Exception.

This way, regardless of python version, passing bytes objects that contain any utf-8 work, while unicode-containing strings that you didn’t manually encode to utf-8 byte strings don’t.

PS: The next best things to a documentation for it is this and of course the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment