Created
February 17, 2012 04:50
-
-
Save isaacs/1850768 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ "inline": | |
{ "unicode-support-in-js-today":"💩" | |
, "unicode-support-in-js-someday":"😁" } | |
, "surrogates": | |
{ "unicode-support-in-js-today":"\uf09f\u92a9" | |
, "unicode-support-in-js-someday":"\uf09f\u9881" } | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function assert(x) { | |
if (!x) console.error("assertion failed") | |
else console.error("assertion passed") | |
} | |
{ "use unicode" // opt-in so we don't break the web | |
var x = "\u1F638" // > 2 byte unicode code point | |
var y = "😁" // face with open mouth and smiling eyes | |
assert(x.length === 1) // less important, but ideal | |
assert(y.length === 1) // less important, but ideal | |
assert(x === y) // unicode code points should match literals | |
console.log(x) // <-- should output a smiley, not "ὣ8" | |
console.log(y) // <-- should output a smiley, not mochibake | |
assert(JSON.stringify(y) === JSON.stringify(x)) | |
assert(JSON.parse(JSON.stringify(y)) === y) | |
assert(JSON.parse(JSON.stringify(x)) === x) | |
assert(x.indexOf(y) === 0) | |
assert(y.indexOf(x) === 0) | |
var arr = ["a", "b", "c"] | |
var axbxc = arr.join(x) | |
var aybyc = arr.join(y) | |
assert(axbxc.split(x)[1] === arr[1]) | |
assert(axbxc.split(y)[1] === arr[1]) | |
// etc. | |
// They're just characters, and just strings. | |
// No special anything, just treat it like any other character. | |
} |
Please see Gist 1861530
For some reason, I couldn't post it as a comment here.
It appears that, in node at least, we're being bitten by http://code.google.com/p/v8/issues/detail?id=761. We will work with v8 to figure out the best solution there, to get from utf8 bytes into a JavaScript string, which doesn't arbitrarily trash non-BMP characters. I apologize for misunderstanding the issue and impugning the good name of JavaScript. (In my defense, it's a particularly complicated issue, and JavaScript's name isn't really all that good ;)
Nevertheless, I think that clearly the long-term correct fix is for JavaScript to handle unicode intelligently (albeit with the presence of big red switches), so I'm very happy to see your proposal.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@izs: ok, that helps -- @allenwb or I will restart, I think with a BRS-per-global, on es-discuss and get it on the next tc39 meeting's agenda.
/be