Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
tar, gzip, and untar files using ruby in memory without tempfiles
Copyright (C) 2011 by Colin MacKenzie IV
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
require 'rubygems'
require 'rubygems/package'
require 'zlib'
require 'fileutils'
module Util
module Tar
# Creates a tar file in memory recursively
# from the given path.
#
# Returns a StringIO whose underlying String
# is the contents of the tar file.
def tar(path)
tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
end
# gzips the underlying string in the given StringIO,
# returning a new StringIO representing the
# compressed file.
def gzip(tarfile)
gz = StringIO.new("")
z = Zlib::GzipWriter.new(gz)
z.write tarfile.string
z.close # this is necessary!
# z was closed to write the gzip footer, so
# now we need a new StringIO
StringIO.new gz.string
end
# un-gzips the given IO, returning the
# decompressed version as a StringIO
def ungzip(tarfile)
z = Zlib::GzipReader.new(tarfile)
unzipped = StringIO.new(z.read)
z.close
unzipped
end
# untars the given IO into the specified
# directory
def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name
if tarfile.directory?
FileUtils.mkdir_p destination_file
else
destination_directory = File.dirname(destination_file)
FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
File.open destination_file, "wb" do |f|
f.print tarfile.read
end
end
end
end
end
end
end
### Usage Example: ###
#
# include Util::Tar
#
# io = tar("./Desktop") # io is a TAR of files
# gz = gzip(io) # gz is a TGZ
#
# io = ungzip(gz) # io is a TAR
# untar(io, "./untarred") # files are untarred
#
@lukewpatterson

This comment has been minimized.

Copy link

@lukewpatterson lukewpatterson commented Apr 17, 2012

goodness

@rorrego

This comment has been minimized.

Copy link

@rorrego rorrego commented Apr 15, 2013

nice

@rafaelrosafu

This comment has been minimized.

Copy link

@rafaelrosafu rafaelrosafu commented May 7, 2013

Just a quick note, you should change the f.print tarfile.read to f.write tarfile.read, otherwise it will save empty files with a nil string inside them.

@penland365

This comment has been minimized.

Copy link

@penland365 penland365 commented Oct 11, 2013

Much thanks for this, it saved me a few hours.

@robertwatts

This comment has been minimized.

Copy link

@robertwatts robertwatts commented Nov 19, 2013

great stuff

@srspnda

This comment has been minimized.

Copy link

@srspnda srspnda commented Jun 26, 2014

Nice stuff; thank you sir.

note Ruby 1.9.3 requires Zlib::GzipReader.open instead of .new

@nleib

This comment has been minimized.

Copy link

@nleib nleib commented Mar 25, 2015

can I save file permissions and time stamps when taring them up?

@pghalliday

This comment has been minimized.

Copy link

@pghalliday pghalliday commented Aug 11, 2015

Won't this skip any files beginning with .. I think the globbing excludes dotfiles by default

@whitehat101

This comment has been minimized.

Copy link

@whitehat101 whitehat101 commented Nov 1, 2015

Can symlinks be written to the TarWriter? The current implementation grabs the file the symlinks points at, or throws Errno::ENOENT on a broken symlink.

@whitehat101

This comment has been minimized.

Copy link

@whitehat101 whitehat101 commented Nov 1, 2015

I was curious, so I benchmarked 74MB (26MB in tgz). On my box, it was only 0.6s slower to do it completely in Ruby space.

Benchmark.bm do |x|
  x.report("shell - String:") do
    io = %x{tar cz #{dir}}
  end
  x.report("shell - StringIO:") do
    io = StringIO.new %x{tar cz #{dir}}
  end
  x.report("ruby:") do
    io = gzip tar dir
  end
       user     system      total        real
shell - String:
  0.060000   0.040000   2.430000 (  2.111896)
shell - StringIO:
  0.060000   0.050000   2.420000 (  2.109480)
ruby:
  2.600000   0.090000   2.690000 (  2.712451)
@whitehat101

This comment has been minimized.

Copy link

@whitehat101 whitehat101 commented Nov 1, 2015

Upstream RubyGems has added support for symlinks. I wrote a refinement that makes that function available today.

I'm on 2.2.2p95 (2015-04-13 revision 50295), and add_symlink was not yet defined.

module TarWriterAddSymlink
  # Backport Symlink Support
  # https://github.com/rubygems/rubygems/blob/4a778c9c2489745e37bcc2d0a8f12c601a9c517f/lib/rubygems/package/tar_writer.rb#L239-L253
  refine Gem::Package::TarWriter do
    def add_symlink(name, target, mode)
      check_closed

      name, prefix = split_name name

      header = Gem::Package::TarHeader.new(:name => name, :mode => mode,
                                           :size => 0, :typeflag => "2",
                                           :linkname => target,
                                           :prefix => prefix,
                                           :mtime => Time.now).to_s

      @io.write header

      self
    end
  end
end

Include the refinement:

module Util
  module Tar
    unless Gem::Package::TarWriter.public_instance_methods.include? :add_symlink
      using TarWriterAddSymlink
    end
    # ...

Updated tar method:

    def tar(path)
      tarfile = StringIO.new("")
      relative_regexp = /^#{Regexp::escape path}\/?/
      Gem::Package::TarWriter.new(tarfile) do |tar|
        Dir[File.join(path, "**/*")].each do |file|
          stat = File.lstat file # don't follow symlinks
          relative_file = file.sub relative_regexp, ''

          case stat.ftype
            when "file"
              tar.add_file relative_file, stat.mode do |tf|
                File.open(file, "rb") { |f| tf.write f.read }
              end
            when "directory"
              tar.mkdir relative_file, stat.mode
            when "link"
              tar.add_symlink relative_file, File.readlink(file), stat.mode
          end
        end
      end

Note: remember Refinements are lexically scoped, so the using TarWriterAddSymlink must be in the same block as the tar method is defined

@smapira

This comment has been minimized.

Copy link

@smapira smapira commented Apr 9, 2018

thanks, am below is fine!

$ ruby -v

ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin15]

from

tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|

to

tarfile = StringIO.new
Gem::Package::TarWriter.new(tarfile) do |tar|
@schneems

This comment has been minimized.

Copy link

@schneems schneems commented May 31, 2018

Wanted to mention that if anyone is using this to untar a file you'll need to add support for the tar reader. Something like this:

+          if tarfile.symlink?
+            File.symlink(tarfile.header.linkname, destination_file)
+          end

Though when I tried to test this with a real world file, it fails so ¯\_(ツ)_/¯.

@sadovnik

This comment has been minimized.

Copy link

@sadovnik sadovnik commented Mar 28, 2019

Thank you!

@Hasstrup

This comment has been minimized.

Copy link

@Hasstrup Hasstrup commented May 27, 2019

Thanks!

@socertis

This comment has been minimized.

Copy link

@socertis socertis commented Jun 29, 2020

thanks!

@vinoth3105

This comment has been minimized.

Copy link

@vinoth3105 vinoth3105 commented Jan 4, 2021

Hi,

def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name

  if tarfile.directory?
    FileUtils.mkdir_p destination_file
  else
    destination_directory = File.dirname(destination_file)
    FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
    File.open destination_file, "wb" do |f|
      f.print tarfile.read
    end
  end
end

end
end

untar('/etc/td-agent/config/scripts', '/etc/td-agent/config/scripts/content.tar.gz')

I am using untar method in my project, but i am getting below error, can you please help on that.

9: from /usr/local/rvm/rubies/ruby-2.7.0/bin/irb:23:in <main>' 8: from /usr/local/rvm/rubies/ruby-2.7.0/bin/irb:23:in load'
7: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/irb-1.2.1/exe/irb:11:in <top (required)>' 6: from (irb):26 5: from (irb):27:in rescue in irb_binding'
4: from (irb):5:in untar' 3: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/package/tar_reader.rb:24:in new'
2: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/package/tar_reader.rb:24:in new' 1: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/package/tar_reader.rb:43:in initialize'
NoMethodError (undefined method `pos' for "/etc/td-agent/config/scripts":String)

@ianfixes

This comment has been minimized.

Copy link

@ianfixes ianfixes commented May 24, 2021

This gist was a great example to get me started with gar/gzip in ruby... but this part didn't feel completely idiomatic to me:

    # un-gzips the given IO, returning the
    # decompressed version as a StringIO
    def ungzip(tarfile)
      z = Zlib::GzipReader.new(tarfile)
      unzipped = StringIO.new(z.read)
      z.close
      unzipped
    end

After a lot of digging, it looks like the block-based form of this would be

Zlib::GzipReader.wrap(tarfile) do |z|
  # do something with z, which is a Zlib::GzipReader that automatically closes afterward.
  # or if you really REALLY need a StringIO
  unzipped = StringIO.new(z.read)   # basically trade one type of IO-compatible object for another one
end 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment