Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
streaming gzip
require 'rubygems'
require 'em-http-request'
# Monkey-patched Gzip Decoder to handle
# Gzip streams.
# This takes advantage of the fact that
# Zlib::GzipReader takes an IO object &
# reads from it as it decompresses.
# It also relies on Zlib only checking for
# nil as the method of determining whether
# it has reached EOF.
# `IO#read(len, buf)` can also denote EOF by returning a string
# shorter than `len`, but Zlib doesn't care about that.
module EventMachine::HttpDecoders
class GZip < Base
class LazyStringIO
def initialize string=""
def << string
@stream << string
def read length=nil,buffer=nil
buffer << @stream[0..(length-1)]
@stream = @stream[length..-1]
def size
def self.encoding_names
%w(gzip compressed)
def decompress(compressed)
@buf ||=
@buf << compressed
# Zlib::GzipReader loads input in 2048 byte chunks
if @buf.size > 2048
@gzip ||= @buf
@gzip.readline # lines are bigger than compressed chunks, so this works
# you could also use #readpartial, but then you need to tune
# the max length
# don't use #read, because it will attempt to read the full file
# readline uses #gets under the covers, so you could try that too.
def finalize
url = "my-streaming-url"
user = "my-user"
password = "my-password" do
http = :head => {
'Accept-Encoding' => 'gzip',
'Authorization' => [ user, password ] }
http.headers do |hash|
p [:status, http.response_header.status]
p [:headers, hash]
if http.response_header.status > 299
puts 'unsuccessful request'
end do |chunk|
print chunk
http.callback do
p "done"
http.errback do
puts "there was an error"
p http.error
p http.response

Hmm, interesting. Should @buf be reset after @gzip.readline? Part of the appeal for is that you don't have to buffer a giant blob of data.. if you do need the entire blob, you should be explicitly buffering it yourself at that point.


baroquebobcat commented Dec 12, 2011

I don't think we'd want to reset it as it might contain more chunks that need to be decompressed.
It only needs to buffer enough to not cause GzipReader to complain. On the other hand, after the headers have been processed, you could probably just call readpartial or gets until the buffer is empty, because the GzipReader probably maintains enough state to keep decompressing.

That sounds reasonable, assuming it can be made to work. :-)

Is there a specific use case where you're streaming a gzipped file? Surprisingly, I haven't had any bug reports or feature requests around this previously.


baroquebobcat commented Dec 13, 2011

It's not a file, it's a continuous stream of compressed data--which is why you can't one shot it, because the request never really ends.

Sorry, bad choice of language there.. that's what I meant. :-)

If you're streaming, the memory bloat associated with buffering the entire response is not an issue for you?


baroquebobcat commented Dec 14, 2011

The LazyStringIO object prevents the whole response from being buffered by dropping the read portions of its buffer.

GzipReader pulls in data on demand provided you call it the right way, so memory usage is limited.

One problem I could see is if chunks are consistently bigger than lines, you'd start queuing up decompressed data that hasn't been passed to callbacks yet.

Maybe you could use readpartial and a loop to pull out all the current decompressed buffer. But, readpartial blocks if there's not unzipped data available.

I wish you could just use it like you did w/ Inflate though. It might be fun to try to implement an equivalent API for Gzip.

Thanks, this helped a lot getting my Powertrack client to interface with Gnip console.


baroquebobcat commented Apr 3, 2012

the monkey patch bit of this has been cleaned up and merged into em-http-request, which rocks!



eriwen commented Dec 21, 2012

@baroquebobcat Thanks for your work producing this. There is discussion that could use your attention at igrigorik/em-http-request#204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment