Skip to content

@chryss /localetest.py
Created

Embed URL

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
#!/usr/bin/env python
# encoding: utf-8
"""
localetest.py - tests collation / sort order for various Latin-script locales
Correct output:
The order for German is:
LATIN SMALL LETTER A
LATIN SMALL LETTER A WITH DIAERESIS
LATIN SMALL LETTER Z
The order for British English is:
LATIN SMALL LETTER A
LATIN SMALL LETTER E
LATIN SMALL LETTER Z
The order for Polish is:
LATIN SMALL LETTER A
LATIN SMALL LETTER A WITH OGONEK
LATIN SMALL LETTER Z
Tests show Polish and German characters with diacritics are sorted incorrectly
after "z" on OS X and maybe other systems.
See also http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
"""
import locale
import unicodedata
testdata = {
'en': {'chars': [u'a', u'z', u'e'], 'localestring': 'en_GB.UTF-8', 'lang': 'British English' },
'de': {'chars': [u'a', u'z', u'ä'], 'localestring': 'de_DE.UTF-8', 'lang': 'German' },
'pl': {'chars': [u'a', u'z', u'ą'], 'localestring': 'pl_PL.UTF-8', 'lang': 'Polish' }
}
for l in testdata:
try:
locale.setlocale(locale.LC_ALL, testdata[l]['localestring'])
except locale.Error as e:
print "Error for %s and locale %s: %s\n" % (l, testdata[l]['localestring'], e)
continue
print "The order for %s is:" % testdata[l]['lang']
for item in sorted(testdata[l]['chars'], cmp=locale.strcoll):
print unicodedata.name(item)
print "The LC_COLLATE culture and encoding settings were %s." % ', '.join(locale.getlocale(locale.LC_COLLATE))
print
@tkopczuk

OS X 10.6.4, gcc 4.2.1, python 2.6.5:
wrong for all, but English.

@chryss
Owner

Same for me. Works fine on Ubuntu Linux 10.4, Python 2.6.5 built with gcc 4.4.3. Also works fine, with changed locale strings, on Windows XP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.