public
Last active

  • Download Gist
localetest.py
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
#!/usr/bin/env python
# encoding: utf-8
 
"""
localetest.py - tests collation / sort order for various Latin-script locales
 
Correct output:
 
The order for German is:
LATIN SMALL LETTER A
LATIN SMALL LETTER A WITH DIAERESIS
LATIN SMALL LETTER Z
 
The order for British English is:
LATIN SMALL LETTER A
LATIN SMALL LETTER E
LATIN SMALL LETTER Z
 
The order for Polish is:
LATIN SMALL LETTER A
LATIN SMALL LETTER A WITH OGONEK
LATIN SMALL LETTER Z
 
Tests show Polish and German characters with diacritics are sorted incorrectly
after "z" on OS X and maybe other systems.
See also http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
"""
 
import locale
import unicodedata
 
testdata = {
'en': {'chars': [u'a', u'z', u'e'], 'localestring': 'en_GB.UTF-8', 'lang': 'British English' },
'de': {'chars': [u'a', u'z', u'ä'], 'localestring': 'de_DE.UTF-8', 'lang': 'German' },
'pl': {'chars': [u'a', u'z', u'ą'], 'localestring': 'pl_PL.UTF-8', 'lang': 'Polish' }
}
 
for l in testdata:
try:
locale.setlocale(locale.LC_ALL, testdata[l]['localestring'])
except locale.Error as e:
print "Error for %s and locale %s: %s\n" % (l, testdata[l]['localestring'], e)
continue
 
print "The order for %s is:" % testdata[l]['lang']
for item in sorted(testdata[l]['chars'], cmp=locale.strcoll):
print unicodedata.name(item)
print "The LC_COLLATE culture and encoding settings were %s." % ', '.join(locale.getlocale(locale.LC_COLLATE))
print

OS X 10.6.4, gcc 4.2.1, python 2.6.5:
wrong for all, but English.

Same for me. Works fine on Ubuntu Linux 10.4, Python 2.6.5 built with gcc 4.4.3. Also works fine, with changed locale strings, on Windows XP.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.