Skip to content

Instantly share code, notes, and snippets.

@maettig
Forked from 140bytes/LICENSE.txt
Created February 7, 2012 03:56
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save maettig/1757071 to your computer and use it in GitHub Desktop.
Save maettig/1757071 to your computer and use it in GitHub Desktop.
encodeJavaScriptString in 140byt.es

Java and JavaScript (.js) source files can use any character encoding. If one programmer uses UTF-8 and the other ISO encoding, there is a chance you will end with something like "Ren�" or "René" instead of "René". The most reliable way to avoid all conversion errors is to encode all special characters with escape sequences. JavaScript allows both hex (e.g. \xFF) and Unicode escape sequences (e.g. \u0100) in string literals while Java allows Unicode escape sequences only.

Click here to see it in action.

Tested with Opera, Firefox and Internet Explorer 8 (insufficient CSS support, but it works).

function f(a, b)
{
return ++b //`b` is a number (including 0) when `replace` calls the function
? '\\' + ( //all escape sequences start with a backslash
(a = a.charCodeAt()) >> 12 //all characters from U+1000 and above
? 'u' //must start with `\u`
: a >> 8 //all characters from U+0100 to U+0FFF
? 'u0' //must start with `\u0`
: 'x' //characters from U+007F to U+00FF can start with `\u00` or `\x`
) + a.toString(16).toUpperCase() //add the upper case hex string (it does not contain leading zeros)
: a.replace(/[^\0-~]/g, f) //else call the function for all non-ASCII characters (all except U+0000 to U+007E)
}
function f(a,b){return++b?'\\'+((a=a.charCodeAt())>>12?'u':a>>8?'u0':'x')+a.toString(16).toUpperCase():a.replace(/[^\0-~]/g,f)}
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2012 Thiemo Mättig <http://maettig.com>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
{
"name": "encodeJavaScriptString",
"description": "Converts JavaScript strings to 7 bit US-ASCII.",
"keywords": [
"encode",
"escaping",
"javascript",
"string",
"unicode"
]
}
<!DOCTYPE html>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<h1>
<input id="i0" name="lang" onclick="refresh()" type="radio" checked><label for="i0" title="\u00FF only">Java String Encoder</label><br>
<input id="i1" name="lang" onclick="refresh()" type="radio"><label for="i1" title="\xFF and \u0100">JavaScript String Encoder</label>
</h1>
<textarea onkeyup="refresh()" rows="8" cols="100">// Paste some Java or JavaScript code into this window.
german = "&Uuml;bergr&ouml;&szlig;e";
smilie = "&#x263A;";</textarea>
<pre onclick="select(this)">This encoding utility requires JavaScript.</pre>
<script type="text/javascript">
// 127 bytes
var encodeJavaScriptString = function f(a, b)
{
return ++b //`b` is a number (including 0) when `replace` calls the function
? '\\' + ( //all escape sequences start with a backslash
(a = a.charCodeAt()) >> 12 //all characters from U+1000 and above
? 'u' //must start with `\u`
: a >> 8 //all characters from U+0100 to U+0FFF
? 'u0' //must start with `\u0`
: 'x' //characters from U+007F to U+00FF can start with `\u00` or `\x`
) + a.toString(16).toUpperCase() //add the upper case hex string (it does not contain leading zeros)
: a.replace(/[^\0-~]/g, f) //else call the function for all non-ASCII characters (all except U+0000 to U+007E)
}
// 115 bytes
var encodeJavaString = function e(a, b)
{
return ++b //`b` is a number when `replace` calls the function
? '\\u' + //in Java all escape sequences must start with `\u`
('00' + a.charCodeAt().toString(16)) //build a hex string with at least 4 characters
.slice(-4).toUpperCase() //use the last 4 characters and make them upper case
: a.replace(/[^\0-~]/g, e) //else call the function for all non-ASCII characters (all except U+0000 to U+007E)
}
// 89 bytes
var select = function(a, b)
{
b = document.createRange();
b.selectNode(a);
window.getSelection().addRange(b)
}
var refresh = function()
{
var t = document.getElementsByTagName('TEXTAREA')[0];
var p = document.getElementsByTagName('PRE')[0];
var f = document.getElementById('i1').checked ? encodeJavaScriptString : encodeJavaString;
p.firstChild.data = f(t.value).replace(/\r\n/g, '\n');
}
refresh();
</script>
@atk
Copy link

atk commented Feb 7, 2012

@maettig: You can lose one of the + after return if you just want to coerce to Number/NaN.

@maettig
Copy link
Author

maettig commented Feb 7, 2012

You are right, but if the regular expression matches the first character in the string, b is 0. That coerces to false. I need something like b >= 0 or b != null and my ++b is a short way to do this.

@tsaniel
Copy link

tsaniel commented Feb 8, 2012

It doesn't work with something like \x00 (returns \x0 instead of \x00).

@maettig
Copy link
Author

maettig commented Feb 8, 2012

I know. Even if it's possible, such characters should never appear in source code files. What we can do is replacing \t with \0 to avoid bad conversions. But I'm not sure if this works in all browsers. Added it to my tests suite.

@maettig
Copy link
Author

maettig commented Feb 10, 2012

According to my tests suite \0 works in all browsers except for Chrome 9. I did an update.

@tsaniel
Copy link

tsaniel commented Feb 11, 2012

Save 1 byte : charCodeAt(0) -> charCodeAt()

@maettig
Copy link
Author

maettig commented Feb 14, 2012

I added this to my tests suite and holy cow, it works in literally all browsers. I did an update.

@tsaniel
Copy link

tsaniel commented Feb 14, 2012

You can check the specification for how charCodeAt works: http://es5.github.com/#x15.5.4.5

@maettig
Copy link
Author

maettig commented Feb 15, 2012

I know, but I couldn't believe it. Calling charCodeAt() with no parameter looks so wrong.

@tsaniel
Copy link

tsaniel commented Feb 15, 2012

Nothing in JavaScript looks right actually...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment