secret
Created

TwitterCLDR notes.

  • Download Gist
twitter-cldr-notes.md
Markdown

Tailoring specs results (diff with ICU)

  1. JA:

    tw:  failures: [["x ", "x"], ["X ", "X"], ["xゞx", "xヽ"]]
    icu: failures: [["x ", "x"], ["X ", "X"]]
    

    Character 'ゞ', code point 0x309E, is not in NFD (its normalized version is 0x309D 0x3099), but there is an entry for denormalized version of this string in FCE table - 309E; [0E 25, 05, 05][, DA 95, 05]. As all strings are normalized first, we don't use this entry, but instead build collation elements for this character from CE's for 0x309D and 0x3099 that are [0E 25, 05, 05] and [, DA 95, 05]. That doesn't cause any issue in the default locale, because the results are identical. But when 'ゝ' (code point 0x309D) is tailored from [0E 25, 05, 05] to [0E 29, 5, 5] in JA locale we get wrong [0E 29, 05, 05][, DA 95, 05] collation elements for 'ゞ'.

    Only one test failure, but in practice there might be more cases like this one. The problem is that FCE table contains denormalized code points and as we normalize all strings before collation we fail to find collation elements. It's a bit unexpected and I'm not sure how we can fix it.

Tests failures for all other locales are identical to the ones of ICU, that might be considered a good result if we think of ICU as a reference implementation.

Hey @KL-7, I've got a few small corrections for this (awesome) writeup:

  1. Under "Summary", #3 JS should be JA.
  2. Under "Summary", #4 should be prefixed with ZH-HANT like the other ones.
  3. The links to the CLDR Trac repo seem to be broken...

Otherwise, this rocks. Thanks!

@camertron, I made the corrections, thanks. The links should be working, though. I believe they have some network issues today, because links from the official site are not opening either.

Uppercase-first sorting for Danish is finished - can you update this gist?

Thanks for mentioning that. I completely removed Danish from the list, because we have only three failures with it now and all of them are identical to the failures of ICU.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.