Skip to content

Instantly share code, notes, and snippets.

@tsaniel
Forked from 140bytes/LICENSE.txt
Created July 16, 2011 14:07
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save tsaniel/1086384 to your computer and use it in GitHub Desktop.
Save tsaniel/1086384 to your computer and use it in GitHub Desktop.
UTF8 encoder

UTF8 encoder

A simple UTF8 encoder.

function(
a, // the text
b, // String.fromCharCode
c, // placeholder
d, // placeholder
e // placeholder
){
for (c=e=''; d=a.charCodeAt(c++); ) // get the Unicode value of the current character
e += d < 128 ? // U+0000-U+007F
b(d) : // 0xxxxxxx
(d < 2048 ? // U+0080-U+07FF
b(d >> 6 | 192) : // 110xxxxx
b(d >> 12 | 224, d >> 6 & 63 | 128) // U+0800-U+FFFF 1110xxxx 10xxxxxx
) + b(d & 63 | 128); // 10xxxxxx
return e;
}
function(a,b,c,d,e){for(c=0,e="";d=a.charCodeAt(c++);)e+=d<128?b(d):(d<2048?b(d>>6|192):b(d>>12|224,d>>6&63|128))+b(d&63|128);return e}
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2011 YOUR_NAME_HERE <YOUR_URL_HERE>
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
{
"name": "utf8encoder",
"description": "A simple UTF8 encoder.",
"keywords": [
"utf8",
"utf-8",
"encode",
"encoder",
"unicode"
]
}
<!DOCTYPE html>
<title>UTF8 encode</title>
<div>Expected value: <b>Normal text</b></div>
<div>Actual value: <b id="ret"></b></div>
<script>
var myFunction = function(a,b,c,d,e){for(c=0,e="";d=a.charCodeAt(c++);)e+=d<128?b(d):(d<2048?b(d>>6|192):b(d>>12|224)+b(d>>6&63|128))+b(d&63|128);return e};
document.getElementById( "ret" ).innerHTML = myFunction('Normal text', String.fromCharCode);
</script>
@atk
Copy link

atk commented Jul 16, 2011

this is really amazing! - pity that charCodeAt and String.fromCharCode are such byte hoggers.

@tsaniel
Copy link
Author

tsaniel commented Jul 17, 2011

Yes, especially the String.fromCharCode. I'm thinking if we can save 20 bytes in order to put the String.fromCharCode inside...

@tsaniel
Copy link
Author

tsaniel commented Jul 17, 2011

It seems that there is a more powerful function...
http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html

@jed
Copy link

jed commented Jul 19, 2011

  • c=0,e="" => c=e=""
  • perhaps this is ripe for some sort of eval? d>>6 could appear 3 times.

@jed
Copy link

jed commented Jul 19, 2011

also, can you exploit the fact that String.fromCharCode(a,null) === String.fromCharCode(a) ?

@tsaniel
Copy link
Author

tsaniel commented Jul 19, 2011

Thanks for your tips, @jed! I think eval is good, but it seems couldn't save bytes with d>>6.
Anyways, that fact is really awesome. I'm still thinking how to exploit it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment