Skip to content

Instantly share code, notes, and snippets.

@KittyGiraudel
Last active August 29, 2015 14:10
Show Gist options
  • Save KittyGiraudel/2fe840cbbc54463c5f0f to your computer and use it in GitHub Desktop.
Save KittyGiraudel/2fe840cbbc54463c5f0f to your computer and use it in GitHub Desktop.

About Sass and Gzip

I am a little familiar with the idea behind Gzip, thanks to this great video from Frédéric Kayser (in French), however there is a something I am still not sure about.

Basically, a string gets more and more compressed every time it is being repeated. And unless I'm wrong, the longer the string, the better.

Let's talk about Sass for a second. Please consider this mixin:

@mixin center($max-width) {
    width: 100%;
    max-width: $max-width;
    margin: 0 auto;
}

The first line is static. The second line is dynamic since it depends on the $max-width variable. The third line is static. Let's say we used it a couple of times, getting this CSS:

.abc {
  width: 100%;
  max-width: 1170px;
  margin: 0 auto;
}

.def {
  width: 100%;
  max-width: 960px;
  margin: 0 auto;
}

Now, as far as I know, Gzip will be able to find those repeated patterns:

 {
  width: 100%;
  max-width: 

And:

;
  margin: 0 auto;
}

Getting back to my question: would it be better to move the dynamic lines on top (or bottom, whatever) of the mixin to group static lines together and benefit from longer strings?

For instance something like:

@mixin center($max-width) {
  width: 100%;
  margin: 0 auto;
  max-width: $max-width;
}

Let's be clear: I am perfectly aware this will make absolutely no difference whatsoever on the file weight. It's just out of curiosity. ;)

@timseverien
Copy link

This is a good question, but incredibly hard to answer. Gzip is based on DEFLATE and LZ77. Both DEFLATE and LZ77 does something called “duplicate string elimination.” Basically, when a string occurs a second time, it is replaced by a pointer to the first string. This reduces the amount of bytes marginally. But DEFLATE also does “bit reduction” using Huffman coding. That means common symbols with large representations are switched by smaller representations. When decoding, these are switched back to their original representations.

My initial guess is yes, I do think that bundling reoccurring strings actually helps, because then the matched string can be replaced by one pointer, instead of two, however there are many factors at play like file size, the size of matched string, symbol usage and more.

Note: The second repeated pattern would include the semi-colon of previous line and the line break.

@CrocoDillon
Copy link

Interesting is I tried it with some test CSS in http://refresh-sf.com/yui/ and the latter (with max-width last) was 1 byte larger than the former.

10 selectors including that mixin with different values. 440 bytes compressed, went up from 134 bytes to 135 bytes gzipped.

@CrocoDillon
Copy link

Tried with some added noise too. http://sassmeister.com/gist/be93d9cab3c157e4c2b0

Both 600 bytes compressed and both 252 bytes gzipped, no difference at all.

@aredridel
Copy link

Also, these algorithms have a block size. Staying within that for repetitions is useful.

@KittyGiraudel
Copy link
Author

Also, these algorithms have a block size. Staying within that for repetitions is useful.

What do you mean @aredridel?

@frkay
Copy link

frkay commented Nov 21, 2014

Beware of microbenchmarks, measuring the compressed size of a full-blown CSS file will probably give different results. Anyhow, I took CrocoDillon noisy sample and made two versions out of it (one with max-width near the center and in the other one pushed near the bottom but still over the noise inducing part) the non minified version takes 813 bytes, minified it goes down to 601 bytes.
Compressed with gzip -n 6 (roughly what Apache would do to compress it on the fly) the first version is reduced to 266 bytes, the second to 265 bytes (246 and 244 bytes respectively for the minified ones).
More advanced compressor like zopfli can further reduce the compressed file, here 260, 258, 241 and 239 bytes, defdb can even report the compressed stream length in bits when bytes are not precise enough.
The z option of gzthermal distinguish LZ matches from literals, a blue background shows symbols that have been copied from a previous location, the orange background shows stand-alone symbols.
I have rearranged gzthermal output layout to look a bit more text friendly (notice that the LineFeed point code is materialized by a square), and here is a side by side comparison of the two versions.
gzthermal -z output
The first thing to notice is that "px" is in fact part of the second match.

In the first case we have two medium sized matches  {□  width: 100%;□  max-width:  is 32 symbols long and px;□  margin: 0 auto;□   is 24 symbols long. A closer look with defdb -t even gives us the cost in bits of these two matches:

 [7] 2E .
 [6] 62 b
[13] (31,80)
 [5] 33 3
 [6] 34 4
 [7] 31 1
[13] (24,80)
 [6] 63 c
 [4] 6F o

13 bits in both cases, thats 26 bits overall.

In the second case the size of the matches is less balanced  {□  width: 100%;□  margin: 0 auto;□  max-width:  is 49 symbols long whereas px;□   is only 6 symbols long, again defdb gives us the cost in bits of these two matches:

 [7] 2E .
 [6] 62 b
[14] (49,80)
 [5] 33 3
 [6] 34 4
 [7] 31 1
[10] (6,80)
 [6] 63 c
 [4] 6F o
14 and 10 bits, thats 24 bits overall (2 bits less than previously) here we see effectively some savings.

Overall practically everything is linked together in a Deflate stream, a lot of variable length coding is performed —Huffman coding— and a slight change in the underlying statistics of the alphabet —not only made of literals but also match lengths— can lead to some unexpected results, therefore saving a bit or two in a specific place may be less interesting than expected (local savings could turn in global loses).
But hey! It seems to be a good idea to bring together as much static data as possible to foster longer LZ matches, perhaps should you now try on real world CSS files and not just code snippets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment