tfheen/gist:587634

## gistfile1.txt
08:07 < sky> phk: so we cannot actually stitch together multiple gziped items
08:07 <@phk> sky, you lost me there ?
08:08 < sky> phk: each esi fragment cannot be gzipped seperately
08:08 <@phk> it sure can
08:08 < sky> nope, the browsers aren't RFC compliant
08:08 < sky> I tried
08:08 < sky> curl does the right thing
08:08 < sky> the browsers don't
08:09 < victori> so I am guessing your proposing on the fly compression?
08:09 < sky> my initial approach did that
08:09 < sky> victori: yes
08:10 < sky> phk: so we can store a gziped result of a ESI document, but then we need to invalidate it when an object is banned or TTL expires
08:11 <@phk> sky, are you saying that the browsers do not grok Z_FULL_FLUSH / Z_FINISH ?
08:11 -!- quodt [lovely@81.189.156.94] has joined #varnish
08:12 < sky> phk: I am saying that as soon as you Z_FINISH you can't continue with another gzip stream
08:12 <@phk> Ok, that's silly (of the browsers)
08:12 <@phk> but you can get pretty much the same effect with Z_FULL_FLUSH
08:12 < sky> yes, very, it is a violation of the gzip rfc
08:12 <@phk> takes a bit more work etc.
08:13 < sky> Z_FULL_FLUSH requires a continuous input stream of uncompressed data
08:13 <@phk> no it doesn't
08:13 <@phk> (but they don't tell you :-)
08:14 < sky> you can't Z_FULL_FUSH one chunk, and then append it to another as far as I can tell and get the browser to read it
08:14 <@phk> Z_FULL_FLUSH is the state you are in, right after the magic-string header.
08:14 < sky> so you are saying compress each object using Z_FULL_FLUSH and then write them one at a time?
08:15 <@phk> well, more than that.
08:15 <@phk> to compress a ESI-component:  Gzip it, end with Z_FULL_FLUSH.  Strip the magic byte header.
08:15 <@phk> To deliver an ESI doc:
08:16 < sky> yes
08:16 < sky> send gzip header, plus each component
08:16 <@phk> send magic byte header, send N{ESI COMPONENTS}, send magic byte stop sequence.
08:17 < sky> and on top of that, ungzip things from the backend if it is gziped
08:17 <@phk> well, that takes you into the big kettle of fish
08:19 < sky> do we store both gzipped and ungzipped copies?
08:19 <@phk> that's a VCL decision
08:19 <@phk> it affects storage use and working set size in a BIG way, so VCL has to decide.
08:20 < sky> how? it doesn't have the notion of multiple varys right now
08:20 <@phk> sky, what is the impact from doing delivery time gzip ?
08:20 < sky> phk: I haven't put it in real production yet
08:20 <@phk> ohh, so you want me to commit an untested patch ?  :-)
08:20 < sky> but when we had apache in front of the server doing gip, it was negible
08:21 < sky> I tend to run the latest committed version :)
08:21 <@Mithrandir> sky: didn't we find out that machine of yours did like 2GByte/sec of gzip -6 ?
08:22 <@Mithrandir> (it'll depend on data, obviously)
08:22 < sky> or rather, once I commit to trunk, I can easily rebase it to 2.0 branch
08:23 <@phk> so you're saying that simply doing delivery time gzip is feasible CPU wise with no concerns ?
08:23 < sky> yes
08:23 <@phk> ok, than we should do that, because doing the fetch thing is a nightmare.
08:23 < sky> the one big downside is that we now store things uncompressed on disk
08:24 < sky> i suggest the solution to that would be a compressed filesystem like btrfs or zfs
08:24 <@Mithrandir> sky: have you tried btrfs in production?
08:25 <@Mithrandir> doesn't it, like, fall over when you fill the disk and such, or has it gotten better now?
08:25 <@phk> sky, compressed filesystems = NO-NO, that would just mean more compression/decompression load for the CPU
08:25 < sky> phk: which is probably a worthwhile tradeoff considering we are limited on iobandwidth and have ton of cpu
08:27 <@phk> I doubt it.
08:27 <@Mithrandir> that's something we can leave to the sysadmin, though.
08:27 <@phk> absolutely
	08:07 < sky> phk: so we cannot actually stitch together multiple gziped items
	08:07 <@phk> sky, you lost me there ?
	08:08 < sky> phk: each esi fragment cannot be gzipped seperately
	08:08 <@phk> it sure can
	08:08 < sky> nope, the browsers aren't RFC compliant
	08:08 < sky> I tried
	08:08 < sky> curl does the right thing
	08:08 < sky> the browsers don't
	08:09 < victori> so I am guessing your proposing on the fly compression?
	08:09 < sky> my initial approach did that
	08:09 < sky> victori: yes
	08:10 < sky> phk: so we can store a gziped result of a ESI document, but then we need to invalidate it when an object is banned or TTL expires
	08:11 <@phk> sky, are you saying that the browsers do not grok Z_FULL_FLUSH / Z_FINISH ?
	08:11 -!- quodt [lovely@81.189.156.94] has joined #varnish
	08:12 < sky> phk: I am saying that as soon as you Z_FINISH you can't continue with another gzip stream
	08:12 <@phk> Ok, that's silly (of the browsers)
	08:12 <@phk> but you can get pretty much the same effect with Z_FULL_FLUSH
	08:12 < sky> yes, very, it is a violation of the gzip rfc
	08:12 <@phk> takes a bit more work etc.
	08:13 < sky> Z_FULL_FLUSH requires a continuous input stream of uncompressed data
	08:13 <@phk> no it doesn't
	08:13 <@phk> (but they don't tell you :-)
	08:14 < sky> you can't Z_FULL_FUSH one chunk, and then append it to another as far as I can tell and get the browser to read it
	08:14 <@phk> Z_FULL_FLUSH is the state you are in, right after the magic-string header.
	08:14 < sky> so you are saying compress each object using Z_FULL_FLUSH and then write them one at a time?
	08:15 <@phk> well, more than that.
	08:15 <@phk> to compress a ESI-component: Gzip it, end with Z_FULL_FLUSH. Strip the magic byte header.
	08:15 <@phk> To deliver an ESI doc:
	08:16 < sky> yes
	08:16 < sky> send gzip header, plus each component
	08:16 <@phk> send magic byte header, send N{ESI COMPONENTS}, send magic byte stop sequence.
	08:17 < sky> and on top of that, ungzip things from the backend if it is gziped
	08:17 <@phk> well, that takes you into the big kettle of fish
	08:19 < sky> do we store both gzipped and ungzipped copies?
	08:19 <@phk> that's a VCL decision
	08:19 <@phk> it affects storage use and working set size in a BIG way, so VCL has to decide.
	08:20 < sky> how? it doesn't have the notion of multiple varys right now
	08:20 <@phk> sky, what is the impact from doing delivery time gzip ?
	08:20 < sky> phk: I haven't put it in real production yet
	08:20 <@phk> ohh, so you want me to commit an untested patch ? :-)
	08:20 < sky> but when we had apache in front of the server doing gip, it was negible
	08:21 < sky> I tend to run the latest committed version :)
	08:21 <@Mithrandir> sky: didn't we find out that machine of yours did like 2GByte/sec of gzip -6 ?
	08:22 <@Mithrandir> (it'll depend on data, obviously)
	08:22 < sky> or rather, once I commit to trunk, I can easily rebase it to 2.0 branch
	08:23 <@phk> so you're saying that simply doing delivery time gzip is feasible CPU wise with no concerns ?
	08:23 < sky> yes
	08:23 <@phk> ok, than we should do that, because doing the fetch thing is a nightmare.
	08:23 < sky> the one big downside is that we now store things uncompressed on disk
	08:24 < sky> i suggest the solution to that would be a compressed filesystem like btrfs or zfs
	08:24 <@Mithrandir> sky: have you tried btrfs in production?
	08:25 <@Mithrandir> doesn't it, like, fall over when you fill the disk and such, or has it gotten better now?
	08:25 <@phk> sky, compressed filesystems = NO-NO, that would just mean more compression/decompression load for the CPU
	08:25 < sky> phk: which is probably a worthwhile tradeoff considering we are limited on iobandwidth and have ton of cpu
	08:27 <@phk> I doubt it.
	08:27 <@Mithrandir> that's something we can leave to the sysadmin, though.
	08:27 <@phk> absolutely