Skip to content

Instantly share code, notes, and snippets.

@chryss
Created August 5, 2010 10:19
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chryss/509520 to your computer and use it in GitHub Desktop.
Save chryss/509520 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
# encoding: utf-8
"""
localetest.py - tests collation / sort order for various Latin-script locales
Correct output:
The order for German is:
LATIN SMALL LETTER A
LATIN SMALL LETTER A WITH DIAERESIS
LATIN SMALL LETTER Z
The order for British English is:
LATIN SMALL LETTER A
LATIN SMALL LETTER E
LATIN SMALL LETTER Z
The order for Polish is:
LATIN SMALL LETTER A
LATIN SMALL LETTER A WITH OGONEK
LATIN SMALL LETTER Z
Tests show Polish and German characters with diacritics are sorted incorrectly
after "z" on OS X and maybe other systems.
See also http://stackoverflow.com/questions/3412933/python-not-sorting-unicode-properly-strcoll-doesnt-help
"""
import locale
import unicodedata
testdata = {
'en': {'chars': [u'a', u'z', u'e'], 'localestring': 'en_GB.UTF-8', 'lang': 'British English' },
'de': {'chars': [u'a', u'z', u'ä'], 'localestring': 'de_DE.UTF-8', 'lang': 'German' },
'pl': {'chars': [u'a', u'z', u'ą'], 'localestring': 'pl_PL.UTF-8', 'lang': 'Polish' }
}
for l in testdata:
try:
locale.setlocale(locale.LC_ALL, testdata[l]['localestring'])
except locale.Error as e:
print "Error for %s and locale %s: %s\n" % (l, testdata[l]['localestring'], e)
continue
print "The order for %s is:" % testdata[l]['lang']
for item in sorted(testdata[l]['chars'], cmp=locale.strcoll):
print unicodedata.name(item)
print "The LC_COLLATE culture and encoding settings were %s." % ', '.join(locale.getlocale(locale.LC_COLLATE))
print
@tkopczuk
Copy link

tkopczuk commented Aug 5, 2010

OS X 10.6.4, gcc 4.2.1, python 2.6.5:
wrong for all, but English.

@chryss
Copy link
Author

chryss commented Aug 5, 2010

Same for me. Works fine on Ubuntu Linux 10.4, Python 2.6.5 built with gcc 4.4.3. Also works fine, with changed locale strings, on Windows XP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment