Skip to content

Instantly share code, notes, and snippets.

@sebastien-p
Created June 14, 2011 09:10
Show Gist options
  • Select an option

  • Save sebastien-p/1024553 to your computer and use it in GitHub Desktop.

Select an option

Save sebastien-p/1024553 to your computer and use it in GitHub Desktop.
Tweet-sized JavaScript implementation of the Lempel–Ziv–Welch universal lossless data compression algorithm.
// Please, see http://rosettacode.org/wiki/LZW_compression#JavaScript
// and http://en.wikipedia.org/wiki/Lempel–Ziv–Welch for more infos.
function (
a // String to compress and placeholder for 'wc'.
){
for (
var b = a + "Ā", // Append first "illegal" character (charCode === 256).
c = [], // dictionary
d = 0, // dictionary size
e = d, // iterator
f = c, // w
g = c, // result
h; // c
h = b.charAt(e++);
)
c[h] = h.charCodeAt(), // Fill in the dictionary ...
f = 1 + c[a = f + h] ? a : (g[d++] = c[f], c[a] = d + 255, h); // ... and use it to compress data.
return g // Array of compressed data.
}
function(a){for(var b=a+"Ā",c=[],d=0,e=d,f=c,g=c,h;h=b.charAt(e++);)c[h]=h.charCodeAt(),f=1+c[a=f+h]?a:(g[d++]=c[f],c[a]=d+255,h);return g}
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
Version 2, December 2004
Copyright (C) 2011 Sebastien P. https://twitter.com/#!/_sebastienp
Special thanks to @subzey (you rock) and @kbjr !
Everyone is permitted to copy and distribute verbatim or modified
copies of this license document, and changing it is allowed as long
as the name is changed.
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. You just DO WHAT THE FUCK YOU WANT TO.
{
"name": "LZWcompress",
"description": "JavaScript implementation of the Lempel–Ziv–Welch universal lossless data compression algorithm.",
"keywords": [
"LZW",
"lossless",
"data",
"compression"
]
}
<!DOCTYPE html>
<title>Foo</title>
<div>Expected value: <b>84,79,66,69,79,82,78,79,84,256,258,260,265,259,261,263</b></div>
<div>Actual value: <b id="ret"></b></div>
<script>
var LZWcompress = function(a){for(var b=a+"Ā",c=[],d=0,e=d,f=c,g=c,h;h=b.charAt(e++);)c[h]=h.charCodeAt(),f=1+c[a=f+h]?a:(g[d++]=c[f],c[a]=d+255,h);return g}
document.getElementById("ret").innerHTML = LZWcompress("TOBEORNOTTOBEORTOBEORNOT")
</script>
@sebastien-p

Copy link
Copy Markdown
Author

@christopherdebeer

Copy link
Copy Markdown

um, its totally nit picking, and everything works great, but in your "test.html" you decalr it as "LZWcompress" and later call it as "LZW" resulting in an "undefined error" on the function call

@sebastien-p

Copy link
Copy Markdown
Author

@christopherdebeer : saw that, already corrected, thanks.

@subzey

subzey commented Jun 14, 2011

Copy link
Copy Markdown

This function returns strange results when chars \0x00 or \u0100 … \uFFFF are used.

@sebastien-p

Copy link
Copy Markdown
Author

@subzey : please, try using this version (http://rosettacode.org/wiki/LZW_compression#JavaScript) and let me know if it still returns those same strange results you're talking about.

@subzey

subzey commented Jun 14, 2011

Copy link
Copy Markdown

@sebastien-p, "\0FOOBAR" in both your and Rosetta functions produces ,256,70,79,79,66,65,82. AFAIK, it should be 0,70,79,79,66,65,82.
I'm sorry about \u0100…\uFFFF, I was confused by 256 in code and forgot that LZW works only with octets.

@subzey

subzey commented Jun 14, 2011

Copy link
Copy Markdown

We can save 13 bytes by filling the dictionary inside the «main» loop, just before accessing it:
function(a,b,c,d,e,f,g,h){for(b={},c=256,d=0,e=f=[];g=a[d++];)b[g]=g.charCodeAt(0),e=b[h=e+g]?h:(f.push(b[e]),b[h]=c++,g);f.push(b[e]);return f}
144 bytes, 4 bytes to go :)

@subzey

subzey commented Jun 14, 2011

Copy link
Copy Markdown

And by the way, using simple var b=…,c=…,d=…,e=… instead of defining as args saves extra 4 bytes:
function(a,f,g,h){for(var b={},c=256,d=0,e=f=[];g=a[d++];)b[g]=g.charCodeAt(0),e=b[h=e+g]?h:(f.push(b[e]),b[h]=c++,g);f.push(b[e]);return f}
140 bytes.

@sebastien-p

Copy link
Copy Markdown
Author

@subzey : thanks a lot, what you did was so helpful ! I just changed ... d = 0 ... into ... d = e = [] ... which works exactly the same. Still 7 bytes to go for IE compatibility :)

@kbjr

kbjr commented Jun 14, 2011

Copy link
Copy Markdown

you can remove the 0 in .charCodeAt(0) because the undefined will coerce to 0

@sebastien-p

Copy link
Copy Markdown
Author

And we also have to correct the "\0FOOBAR" bug (https://gist.github.com/1024553#gistcomment-35550) ...

@sebastien-p

Copy link
Copy Markdown
Author

@kbjr : thanks !

@subzey

subzey commented Jun 15, 2011

Copy link
Copy Markdown

My suggestion:

function(a){a+='ħ';for(var b={},c=256,d,e,f=d=e=[],g,h;g=a[d++];)b[g]=g.charCodeAt(),e=1+b[h=e+g]?h:(f.push(b[e]),b[h]=c++,g);return f}
Please note that there is 135 chars but 136 bytes due to ħ.

e = 1 + b[h = e + g] ? … : … fixes "\0FOOBAR" bug:
b[h=e+g] is presumably non-negative. After adding 1 we get positive value («true») if operand was a number or NaN («false») if it was undefined.

Then, instead of using f.push(b[e]) twice we can just add extra «illegal» char with charCode > 255 to encoded string that in conjuction with previous char(s) cannot be in dictionary. And then drop the push of last char.
'ħ' is picked randomly as a tribute to Max Planck :)

@sebastien-p

Copy link
Copy Markdown
Author

@subzey : you rock !

The only thing left is to make enough room to write g = a.charAt(d++); instead of g = a[d++]; !

@subzey

subzey commented Jun 15, 2011

Copy link
Copy Markdown

function(h){for(var a=h+'ħ',b=[],c=0,d=b,e=b,f=b,g;g=a.charAt(d++);)b[g]=g.charCodeAt(),e=1+b[h=e+g]?h:(f[c++]=b[e],b[h]=c+255,g);return f}
139 chars, 140 bytes, works perfectly well in IE6, IE8, IE9 (didn't test in native IE7)

Optimizations applied:

  • c is initially 0, not 256, used post-decrement instead of pre-decrement
  • push changed to index-based assignment as if c starts with 0, index is always equal to c
  • Rearranged variables declaration and «illegal» char appending, stripping one byte

@sebastien-p

Copy link
Copy Markdown
Author

@subzey : Wonderful ! IE7 should not be a problem. I did something like that to say goodbye to push but gave up on it in the end.

@bytespider

Copy link
Copy Markdown

Well done guys, this truly is awesome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment