Skip to content

Instantly share code, notes, and snippets.

@xem
Forked from 140bytes/LICENSE.txt
Last active May 15, 2016 17:31
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save xem/7584765 to your computer and use it in GitHub Desktop.
Save xem/7584765 to your computer and use it in GitHub Desktop.
Challenge: encode / decode 400 (or more) ASCII characters in a single tweet!

Hi,

A tweet can contain 140 UTF-16 characters.

An UTF-16 character can be composed of 2 16-bits surrogates.

A UTF-16 surrogate can be used to store 10 bits.

An ASCII character is 7 bits long.

So, a tweet can encode 140 x 2 x 10 = 2800 bits = 400 plain ASCII characters.

The challenge is to make an encoder (converting 400 - or more - ASCII chars in 140 UTF-16 chars) and a decoder (doing the opposite) that can both fit in a tweet.

NB: the encoder and decoder can be packed with this: https://gist.github.com/xem/7086007

NB2: non-printable characters 0x00 to 0x1F and 0x7F can be omitted.

Have fun!

V2

Encodes 400 ASCII chars.

Encoder: 190 chars minified, 140 chars packed

e=function(e,c,b,d){d=b="";for(c in e)b+=(0+e.charCodeAt(c).toString(2)).slice(-7);for(c=0;b;b=b.slice(10))d+=String.fromCharCode((c++%2?56320:55296)+parseInt(b.substring(0,10),2));return d}

// or

eval(unescape(escape("𩐽𩡵𫡣𭁩𫱮𚁥𛁣𛁢𛁤𚑻𩀽𨠽𘠢𞱦𫱲𚁣𘁩𫠠𩐩𨠫🐨𜀫𩐮𨱨𨑲𠱯𩁥𠑴𚁣𚐮𭁯𤱴𬡩𫡧𚀲𚐩𛡳𫁩𨱥𚀭𝰩𞱦𫱲𚁣🐰𞱢𞱢👢𛡳𫁩𨱥𚀱𜀩𚑤𚰽𤱴𬡩𫡧𛡦𬡯𫑃𪁡𬡃𫱤𩐨𚁣𚰫𙐲🰵𝠳𜠰𞠵𝐲𞐶𚐫𬁡𬡳𩑉𫡴𚁢𛡳𭑢𬱴𬡩𫡧𚀰𛀱𜀩𛀲𚐩𞱲𩑴𭑲𫠠𩁽").replace(/uD./g,'')))

Decoder: 159 chars minified, 124 chars packed

e=function(e,d,b,c){c=b="";for(d=0;400>d;)b=b.slice(7)+e.charCodeAt(d++).toString(2).slice(-10),c+=String.fromCharCode(parseInt(b.substring(0,7),2));return c}

// or

eval(unescape(escape("𩀽𩡵𫡣𭁩𫱮𚁥𛁤𛁢𛁣𚑻𨰽𨠽𘠢𞱦𫱲𚁤🐰𞰴𜀰🡤𞰩𨠽𨠮𬱬𪑣𩐨𝰩𚱥𛡣𪁡𬡃𫱤𩑁𭀨𩀫𚰩𛡴𫱓𭁲𪑮𩰨𜠩𛡳𫁩𨱥𚀭𜐰𚐬𨰫👓𭁲𪑮𩰮𩡲𫱭𠱨𨑲𠱯𩁥𚁰𨑲𬱥𢑮𭀨𨠮𬱵𨡳𭁲𪑮𩰨𜀬𝰩𛀲𚐩𞱲𩑴𭑲𫠠𨱽").replace(/uD./g,'')))

Demo and source code:

http://jsfiddle.net/BD9wP/

// Encoder
window.e=function(source,i,tmp,result){
tmp = "";
result = "";
for(i in source){
tmp += (0+source.charCodeAt(i).toString(2)).slice(-7);
}
for(i = 0; tmp; tmp = tmp.slice(10)){
result += String.fromCharCode((i++ % 2 ? 0xDC00 : 0xD800) + parseInt(tmp.substring(0, 10), 2));
}
return result;
}
// Decoder
window.d=function(source,i,tmp,result){
tmp = "";
result = "";
for(i = 0; i < 400; ){
tmp = tmp.slice(7) + source.charCodeAt(i++).toString(2).slice(-10);
result += String.fromCharCode(parseInt(tmp.substring(0, 7), 2));
}
return result;
}
@subzey
Copy link

subzey commented Nov 22, 2013

I suppose, it's impossible to beat the compression ratio, the only option is to make the encoder and decoder smaller.

By the way, why are e= and d= packed in? It's a global scope pollution as far as I understand.

@xem
Copy link
Author

xem commented Nov 22, 2013

Hi :)

Indeed, the code can be smaller, I'm also working on it, and I'll post an update soon...

About the compression ratio, I consider there is currently no compression at all because each ASCII character uses 7 whole bits in the final tweet. I'm sure it could take less than that, by using some Huffman-like or Gzip-like compression algorithm. (I'm working on it too ^^)

And about the global scope pollution, well, all 140byt.es entries leak a function in the global scope, that's not a big deal. It's... mandatory. I just packed e() and d() so that they can both fit in 140 characters. That's not a problem.

If other global vars than d and e had leaked, THAT would have been a problem.

@xem
Copy link
Author

xem commented Nov 23, 2013

Update: I made a new encoder/decoder, 140 + 124 characters, lighter and much simpler (it uses a string containing the binary representation of the text instead of complex maths)

Still, I'm sure it can be improved ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment