Skip to content

Instantly share code, notes, and snippets.

@xem
Forked from 140bytes/LICENSE.txt
Last active August 29, 2015 14:01
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xem/1d646596b2d9c80f2769 to your computer and use it in GitHub Desktop.
Save xem/1d646596b2d9c80f2769 to your computer and use it in GitHub Desktop.
Mini string encoders for the Web

Mini string encoders/decoders for the Web

Here's a list of tiny functions whose goal is to convert any Unicode string in 28 different formats used on the Web, and do the reverse operation.

An online converter using those functions can be found here: http://xem.github.io/escape

Please contribute in the comments (or by pull request) if you have a feature idea or a code-golf improvement to make or a bug to solve.

NB: to be as short as possible, all those functions assume a valid input and convert all chars (including ASCII chars and non-ASCII chars that don't really need to be converted in some cases).

/** Encoders **/
// Plain text => array of UTF-16 BE charCodes
e2=function(b,a,c){a=[];for(c in b)a[c]=b.charCodeAt(c);return a}
// Plain text => array of UTF-16 LE charCodes
e3=function(b,a,c,t){a=[];for(c in b)t=b.charCodeAt(c),a[c]=((t&0xff)<<8)+(t>>8);return a}
// Plain text => array of codePoints
e4=function(b,a,c,d){a=[];for(c=0;c<b.length;c++)54==(b[d="charCodeAt"](c)>>10)?(a.push(1024*(b[d](c)-55296)+b[d](c+1)+9216),c++):a.push(b[d](c));return a}
// Plain text => array of UTF-8 bytes
e5=function(b,a,c,d,n){a=[];for(c=0;c<b.length;c++)128>b[d="charCodeAt"](c)?a.push(b[d](c)):(n=b[c],55296==(b[d](c)&64512)&&(n=b.substr(c,2),c++),encodeURI(n).replace(/\w+/g,function(b){a.push(parseInt(b,16))}));return a}
// Array of UTF-16 BE CharCodes => JS string
e6=function(b,a,c){a="";for(c in b)a+="\\u"+(1E3+b[c].toString(16)).slice(-4);return a}
// UTF-7
// Ungolfed: http://xem.github.io/escape/utf7.js
utf7=function(e,f,g,a,c,d,b){d=b="";e=e.concat(32);c=String.fromCharCode;for(g in e)a=e[g],32>a||125<a?d+=c(a>>8)+c(a&255):(d&&(b+=c(f)+btoa(d).replace(/=+$/,"")+"-",d=""),b=a==f?b+(c(f)+"-"):b+c(a));return b.slice(0,-1)}
// Array of UTF-16 BE CharCodes => UTF-7
e7=function(b){return utf7(b,0x2B)}
// Array of UTF-16 BE CharCodes => UTF-7 (IMAP)
e8=function(b){return utf7(b,0x26)}
// Array of UTF-16 BE CharCodes => Base64 (UTF-16 BE)
e9=function(b,a,c,s,t){a="";s=String.fromCharCode;for(c in b)t=b[c],a+=s(t>>8)+s(t&0xff);return btoa(a)}
// Array of UTF-16 BE CharCodes => DataURI + Base64 (UTF-16 BE)
e10=function(b){return "data:;charset=utf-16BE;base64,"+e9(b)}
// Array of UTF-16 BE CharCodes => Base64 (UTF-16 LE)
e11=function(b){return e9(b)}
// Array of UTF-16 BE CharCodes => DataURI + Base64 (UTF-16 LE)
e12=function(b){return "data:;charset=utf-16LE;base64,"+e11(b)}
// Array of CodePoints => Decimal HTML entities
e13=function(b,a,c){a="";for(c in b)a+="&#"+b[c]+";";return a}
// Array of CodePoints => Hexadecimal HTML entities
e14=function(b,a,c){a="";for(c in b)a+="&#x"+b[c].toString(16)+";";return a}
// Array of CodePoints => CSS selector / font-family
e15=function(b,a,c){a="";for(c in b)a+="\\"+b[c].toString(16)+" ";return a}
// Array of CodePoints => CSS selector / font-family
e16=function(b,a,c){a="";for(c in b)a+="\\\\"+b[c].toString(16)+" ";return a}
// Array of CodePoints => CSS unicode-range
e17=function(b,a,c){a="";for(c in b)a+="U+"+b[c].toString(16)+",";return a.slice(0,-1)}
// Array of CodePoints => ES6 string
e18=function(b,a,c){a="";for(c in b)a+="\\u{"+b[c].toString(16)+"}";return a}
// Array of CodePoints => Punycode
// Ungolfed: http://xem.github.io/escape/punycode.js
e19=function(p,d,b,l,v,m,c,n,g,h,q,e,f,r,w,x,y,s,u,t,k){t=String.fromCharCode;k=Math.floor;f=[];d=128;m=72;for(c=b=0;c<(r=p.length);++c)128>(e=p[c])&&f.push(t(e));for((l=v=f.length)&&f.push("-");l<r;){n=1E9;for(c=0;c<r;++c)(e=p[c])>=d&&e<n&&(n=e);b+=(n-d)*(w=l+1);d=n;for(c=0;c<r;++c)if((e=p[c])<d&&++b,e==d){g=b;for(h=36;!(g<(q=h<=m?1:h>=m+26?26:h-m));h+=36)u=q+(y=g-q)%(x=36-q),f.push(t(u+22+75*(26>u))),g=k(y/x);f.push(t(g+22+75*(26>g)));s=0;b=l==v?k(b/700):b>>1;for(b+=k(b/w);455<b;s+=36)b=k(b/ 35);m=k(s+36*b/(b+38));b=0;++l}++b;++d}return f.join("")}
// Array of CodePoints => IDN
e20=function(b){return "xn--"+e19(b)}
// Array of UTF-8 bytes => Hexadecimal
e21=function(b,a,c){a="";for(c in b)a+="\\x"+b[c].toString(16);return a}
// Array of UTF-8 bytes => Octal
e22=function(b,a,c){a="";for(c in b)a+="\\"+b[c].toString(8);return a}
// Array of UTF-8 bytes => URL encode
e23=function(b,a,c){a="";for(c in b)a+="%"+b[c].toString(16);return a}
// Array of UTF-8 bytes => Q / Quoted-printable
e24=function(b,a,c){a="";for(c in b)a+="="+b[c].toString(16);return a}
// Array of UTF-8 bytes => MIME + Q / Quoted-printable
e25=function(b,a,c){a="=?UTF-8?Q?";for(c in b)a+="="+b[c].toString(16);return a+"?="}
// Array of UTF-8 bytes => Base64
e26=function(b,a,c){a="";for(c in b)a+=String.fromCharCode(b[c]);return btoa(a)}
// Array of UTF-8 bytes => MIME + Base64
e27=function(b,a,c){a="";for(c in b)a+=String.fromCharCode(b[c]);return"=?UTF-8?B?"+btoa(a)+"?="}
// Array of UTF-8 bytes => data-URI + Base64
e28=function(b,a,c){a="";for(c in b)a+=String.fromCharCode(b[c]);return"data:;charset=utf-8;base64,"+btoa(a)}
/** Decoders **/
// Array of UTF-16 BE charCodes => Plain text
d2=function(b,a,c){a="";for(c in b)a+=String.fromCharCode(b[c]);return a}
// Array of UTF-16 LE charCodes => Plain text
d3=function(b,a,c){/* to golf */}
// Array of codePoints => Plain text
d4=function(b,a,c){a="";for(c in b)if(b[c]>0xFFFF){a+=String.fromCharCode(Math.floor((b[c]-0x10000)/0x400)+0xD800);a+=String.fromCharCode((b[c]-0x10000)%0x400+0xDC00)}else{a+=String.fromCharCode(b[c])}return a}
// Array of UTF-8 bytes => Plain text
d5=function(b,a,c){a="";for(c in b)a+="%"+b[c].toString(16);console.log(a);return decodeURI(a)}
// JS string => Array of CharCodes
d=function(b,a,c){/* to golf */}
// Decimal HTML entities => Array of CodePoints
d=function(b,a,c){/* to golf */}
// Hexadecimal HTML entities => Array of CodePoints
d=function(b,a,c){/* to golf */}
// CSS selector / font-family => Array of CodePoints
d=function(b,a,c){/* to golf */}
// CSS selector / font-family => Array of CodePoints
d=function(b,a,c){/* to golf */}
// CSS unicode-range =>Array of CodePoints
d=function(b,a,c){/* to golf */}
// ES6 string => Array of CodePoints
d=function(b,a,c){/* to golf */}
// Punycode => Array of CodePoints
d=function(b){/* to golf */}
// IDN => Array of CodePoints
d=function(b){/* to golf */}
// UTF-7 => Array of CodePoints
d=function(b){/* to golf */}
// UTF-7 (IMAP) => Array of CodePoints
d=function(b){/* to golf */}
// Hexadecimal => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
// Octal => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
// URL encode => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
// Q / Quoted-printable => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
// MIME + Q / Quoted-printable => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
// Base64 => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
// MIME + Base64 => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
// data-URI + Base64 => Array of UTF-8 bytes
d=function(b,a,c){/* to golf */}
@mathiasbynens
Copy link

This is insane! :) Nice work!

The usual disclaimers for code-golfed snippets apply: in production you’ll probably want to use more robust solutions. Even things like HTML entity encoding/decoding aren’t as simple as it seems.

@xem
Copy link
Author

xem commented May 26, 2014

Thanks!
Indeed, this is a minimalist approach. But it aims to be as correct as possible.
Also, most of these functions are inspired by your work sir (cf. http://mothereff.in/)
And by that too: http://0xcc.net/jsescape/
Thanks for the link, I'll use it to handle overrides in my HTML decoder!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment