Skip to content

Instantly share code, notes, and snippets.

@mathiasbynens
Created September 26, 2011 19:50
Show Gist options
  • Save mathiasbynens/1243213 to your computer and use it in GitHub Desktop.
Save mathiasbynens/1243213 to your computer and use it in GitHub Desktop.
Escape all characters in a string using both Unicode and hexadecimal escape sequences
// Ever needed to escape '\n' as '\\n'? This function does that for any character,
// using hex and/or Unicode escape sequences (whichever are shortest).
// Demo: http://mothereff.in/js-escapes
function unicodeEscape(str) {
return str.replace(/[\s\S]/g, function(character) {
var escape = character.charCodeAt().toString(16),
longhand = escape.length > 2;
return '\\' + (longhand ? 'u' : 'x') + ('0000' + escape).slice(longhand ? -4 : -2);
});
}
@adamvleggett
Copy link

If the goal is to do this with minimal code size, the following works well and minifies to ~100 bytes:

function escapeUnicode(str) {
    return str.replace(/[^\0-~]/g, function(ch) {
        return "\\u" + ("000" + ch.charCodeAt().toString(16)).slice(-4);
    });
}

@F1LT3R
Copy link

F1LT3R commented Dec 15, 2016

Fantastic! Thanks for this @mathiasbynens!

@mervick
Copy link

mervick commented Nov 13, 2018

Replace only unicode characters

function escapeUnicode(str) {
  return str.replace(/[\u00A0-\uffff]/gu, function (c) {
    return "\\u" + ("000" + c.charCodeAt().toString(16)).slice(-4)
  });
}

I use this for convert utf8 content of js files to latin1

@rafaelvanat
Copy link

Very interesting work guys, thanks for sharing.
@mervick was especially useful for my use case, any restriction to use it? Thanks!

@mervick
Copy link

mervick commented Dec 19, 2019

@rafaelvanat I used that in my project more then year, and so far there have been no problems

@josephrocca
Copy link

josephrocca commented Jun 18, 2020

@mervick @rafaelvanat If I use that function like this:

escapeUnicode("abc𝔸𝔹ℂ")

Then I get:

abc𝔸𝔹\u2102

The following function fixes this by matching all non-ASCII characters after splitting the string in a "unicode-safe" way (using [...str]). It then splits each Unicode character up into its code-points, and gets the escape code for each (rather than just grabbing the first char code of each Unicode character):

function escapeUnicode(str) {
  return [...str].map(c => /^[\x00-\x7F]$/.test(c) ? c : c.split("").map(a => "\\u" + a.charCodeAt().toString(16).padStart(4, "0")).join("")).join("");
}

This gives the correct result:

abc\ud835\udd38\ud835\udd39\u2102

This seems to work fine in all my tests so far, but if I find any bugs I'll add fixes in this gist. Performance doesn't matter for my use-case, so I haven't benchmarked or optimised it at all.

@mathiasbynens
Copy link
Author

Check out jsesc which solves this problem in a more robust manner.

@josephrocca
Copy link

josephrocca commented Jun 19, 2020

@mathiasbynens It looks great! I did try to use it but unfortunately I'm not up to date with all the browserify/bundling stuff and just need a vanilla JS script (e.g. no use of Buffer) to include in a module import and wasn't able to work out how to do that with jsesc (though I admit I only poked around for a few minutes before deciding to write the function above). Also, out of pure curiosity I'd be interested in cases where the above function fails - I couldn't find any failing cases in my tests.

@mathiasbynens
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment