Skip to content

Instantly share code, notes, and snippets.

@risacher
Last active December 30, 2015 05:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save risacher/7782188 to your computer and use it in GitHub Desktop.
Save risacher/7782188 to your computer and use it in GitHub Desktop.
nodejs code demonstrating how Unicode strings from the same source can get changed so they are not the same, and how to renormalize then for comparison.
var unorm = require('unorm');
// These strings appear to be the same, but are NOT!
// String 1 is from a [track][location][path] extracted with Scripting Bridge from iTunes on Mac running an HFS+ filesystem
// String 2 is a path from an HFS+ filesystem mounted on Linux, then stored in an SQLite3 TEXT field.
// The file String 1 was derived from was copied from the file String 2 was derived from using rsync.
// Both strings then printed in Terminal.app and pasted into Emacs.
// Demonstration that two strings that appear the same might not be, even when they came from the same place originally.
// I have no idea if the normalization changed because Linux HFS+ is
// different from Mac HFS+, or if SQLite3 renormalizes Unicode, or if
// iTunes did it, or rsync, or Scripting Bridge.
var string1 = "Jan Lindblad/i en klosterträdgård/01 - Våren.mp3";
var string2 = "Jan Lindblad/i en klosterträdgård/01 - Våren.mp3";
['nfd','nfc','nfkd','nfkc'].forEach(function(normalization) {
if (unorm[normalization](string1) === string2) {
console.log('the strings match with string1 %s\'d', normalization);
} else {
console.log('strings not match with string1 %s\'d', normalization);
}
if (unorm[normalization](string2) === string1) {
console.log('the strings match with string2 %s\'d', normalization);
} else {
console.log('strings not match with string2 %s\'d', normalization);
}
});
// Result:
// strings not match with string1 nfd'd
// the strings match with string2 nfd'd
// the strings match with string1 nfc'd
// the strings match with string2 nfc'd
// strings not match with string1 nfkd'd
// the strings match with string2 nfkd'd
// the strings match with string1 nfkc'd
// the strings match with string2 nfkc'd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment