Skip to content

Instantly share code, notes, and snippets.

@isaacs
Created February 17, 2012 04:50
Show Gist options
  • Save isaacs/1850768 to your computer and use it in GitHub Desktop.
Save isaacs/1850768 to your computer and use it in GitHub Desktop.
{ "inline":
{ "unicode-support-in-js-today":"💩"
, "unicode-support-in-js-someday":"😁" }
, "surrogates":
{ "unicode-support-in-js-today":"\uf09f\u92a9"
, "unicode-support-in-js-someday":"\uf09f\u9881" }
}
function assert(x) {
if (!x) console.error("assertion failed")
else console.error("assertion passed")
}
{ "use unicode" // opt-in so we don't break the web
var x = "\u1F638" // > 2 byte unicode code point
var y = "😁" // face with open mouth and smiling eyes
assert(x.length === 1) // less important, but ideal
assert(y.length === 1) // less important, but ideal
assert(x === y) // unicode code points should match literals
console.log(x) // <-- should output a smiley, not "ὣ8"
console.log(y) // <-- should output a smiley, not mochibake
assert(JSON.stringify(y) === JSON.stringify(x))
assert(JSON.parse(JSON.stringify(y)) === y)
assert(JSON.parse(JSON.stringify(x)) === x)
assert(x.indexOf(y) === 0)
assert(y.indexOf(x) === 0)
var arr = ["a", "b", "c"]
var axbxc = arr.join(x)
var aybyc = arr.join(y)
assert(axbxc.split(x)[1] === arr[1])
assert(axbxc.split(y)[1] === arr[1])
// etc.
// They're just characters, and just strings.
// No special anything, just treat it like any other character.
}
@allenwb
Copy link

allenwb commented Feb 19, 2012

@mranney @piscisaureus

Please see Gist 1861530

For some reason, I couldn't post it as a comment here.

@isaacs
Copy link
Author

isaacs commented Feb 21, 2012

It appears that, in node at least, we're being bitten by http://code.google.com/p/v8/issues/detail?id=761. We will work with v8 to figure out the best solution there, to get from utf8 bytes into a JavaScript string, which doesn't arbitrarily trash non-BMP characters. I apologize for misunderstanding the issue and impugning the good name of JavaScript. (In my defense, it's a particularly complicated issue, and JavaScript's name isn't really all that good ;)

Nevertheless, I think that clearly the long-term correct fix is for JavaScript to handle unicode intelligently (albeit with the presence of big red switches), so I'm very happy to see your proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment