Skip to content

Instantly share code, notes, and snippets.

@sinisterchipmunk
Last active September 8, 2023 17:57
Show Gist options
  • Save sinisterchipmunk/1335041 to your computer and use it in GitHub Desktop.
Save sinisterchipmunk/1335041 to your computer and use it in GitHub Desktop.
tar, gzip, and untar files using ruby in memory without tempfiles
Copyright (C) 2011 by Colin MacKenzie IV
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
require 'rubygems'
require 'rubygems/package'
require 'zlib'
require 'fileutils'
module Util
module Tar
# Creates a tar file in memory recursively
# from the given path.
#
# Returns a StringIO whose underlying String
# is the contents of the tar file.
def tar(path)
tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|
Dir[File.join(path, "**/*")].each do |file|
mode = File.stat(file).mode
relative_file = file.sub /^#{Regexp::escape path}\/?/, ''
if File.directory?(file)
tar.mkdir relative_file, mode
else
tar.add_file relative_file, mode do |tf|
File.open(file, "rb") { |f| tf.write f.read }
end
end
end
end
tarfile.rewind
tarfile
end
# gzips the underlying string in the given StringIO,
# returning a new StringIO representing the
# compressed file.
def gzip(tarfile)
gz = StringIO.new("")
z = Zlib::GzipWriter.new(gz)
z.write tarfile.string
z.close # this is necessary!
# z was closed to write the gzip footer, so
# now we need a new StringIO
StringIO.new gz.string
end
# un-gzips the given IO, returning the
# decompressed version as a StringIO
def ungzip(tarfile)
z = Zlib::GzipReader.new(tarfile)
unzipped = StringIO.new(z.read)
z.close
unzipped
end
# untars the given IO into the specified
# directory
def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name
if tarfile.directory?
FileUtils.mkdir_p destination_file
else
destination_directory = File.dirname(destination_file)
FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
File.open destination_file, "wb" do |f|
f.print tarfile.read
end
end
end
end
end
end
end
### Usage Example: ###
#
# include Util::Tar
#
# io = tar("./Desktop") # io is a TAR of files
# gz = gzip(io) # gz is a TGZ
#
# io = ungzip(gz) # io is a TAR
# untar(io, "./untarred") # files are untarred
#
@lukewpatterson
Copy link

goodness

@rorrego
Copy link

rorrego commented Apr 15, 2013

nice

@rafaelrosafu
Copy link

Just a quick note, you should change the f.print tarfile.read to f.write tarfile.read, otherwise it will save empty files with a nil string inside them.

@penland365
Copy link

Much thanks for this, it saved me a few hours.

@robertwatts
Copy link

great stuff

@srspnda
Copy link

srspnda commented Jun 26, 2014

Nice stuff; thank you sir.

note Ruby 1.9.3 requires Zlib::GzipReader.open instead of .new

@nleib
Copy link

nleib commented Mar 25, 2015

can I save file permissions and time stamps when taring them up?

@pghalliday
Copy link

Won't this skip any files beginning with .. I think the globbing excludes dotfiles by default

@whitehat101
Copy link

Can symlinks be written to the TarWriter? The current implementation grabs the file the symlinks points at, or throws Errno::ENOENT on a broken symlink.

@whitehat101
Copy link

I was curious, so I benchmarked 74MB (26MB in tgz). On my box, it was only 0.6s slower to do it completely in Ruby space.

Benchmark.bm do |x|
  x.report("shell - String:") do
    io = %x{tar cz #{dir}}
  end
  x.report("shell - StringIO:") do
    io = StringIO.new %x{tar cz #{dir}}
  end
  x.report("ruby:") do
    io = gzip tar dir
  end
       user     system      total        real
shell - String:
  0.060000   0.040000   2.430000 (  2.111896)
shell - StringIO:
  0.060000   0.050000   2.420000 (  2.109480)
ruby:
  2.600000   0.090000   2.690000 (  2.712451)

@whitehat101
Copy link

Upstream RubyGems has added support for symlinks. I wrote a refinement that makes that function available today.

I'm on 2.2.2p95 (2015-04-13 revision 50295), and add_symlink was not yet defined.

module TarWriterAddSymlink
  # Backport Symlink Support
  # https://github.com/rubygems/rubygems/blob/4a778c9c2489745e37bcc2d0a8f12c601a9c517f/lib/rubygems/package/tar_writer.rb#L239-L253
  refine Gem::Package::TarWriter do
    def add_symlink(name, target, mode)
      check_closed

      name, prefix = split_name name

      header = Gem::Package::TarHeader.new(:name => name, :mode => mode,
                                           :size => 0, :typeflag => "2",
                                           :linkname => target,
                                           :prefix => prefix,
                                           :mtime => Time.now).to_s

      @io.write header

      self
    end
  end
end

Include the refinement:

module Util
  module Tar
    unless Gem::Package::TarWriter.public_instance_methods.include? :add_symlink
      using TarWriterAddSymlink
    end
    # ...

Updated tar method:

    def tar(path)
      tarfile = StringIO.new("")
      relative_regexp = /^#{Regexp::escape path}\/?/
      Gem::Package::TarWriter.new(tarfile) do |tar|
        Dir[File.join(path, "**/*")].each do |file|
          stat = File.lstat file # don't follow symlinks
          relative_file = file.sub relative_regexp, ''

          case stat.ftype
            when "file"
              tar.add_file relative_file, stat.mode do |tf|
                File.open(file, "rb") { |f| tf.write f.read }
              end
            when "directory"
              tar.mkdir relative_file, stat.mode
            when "link"
              tar.add_symlink relative_file, File.readlink(file), stat.mode
          end
        end
      end

Note: remember Refinements are lexically scoped, so the using TarWriterAddSymlink must be in the same block as the tar method is defined

@smapira
Copy link

smapira commented Apr 9, 2018

thanks, am below is fine!

$ ruby -v

ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin15]

from

tarfile = StringIO.new("")
Gem::Package::TarWriter.new(tarfile) do |tar|

to

tarfile = StringIO.new
Gem::Package::TarWriter.new(tarfile) do |tar|

@schneems
Copy link

Wanted to mention that if anyone is using this to untar a file you'll need to add support for the tar reader. Something like this:

+          if tarfile.symlink?
+            File.symlink(tarfile.header.linkname, destination_file)
+          end

Though when I tried to test this with a real world file, it fails so ¯\_(ツ)_/¯.

@sadovnik
Copy link

Thank you!

@Hasstrup
Copy link

Thanks!

@socertis
Copy link

thanks!

@vinoth3105
Copy link

vinoth3105 commented Jan 4, 2021

Hi,

def untar(io, destination)
Gem::Package::TarReader.new io do |tar|
tar.each do |tarfile|
destination_file = File.join destination, tarfile.full_name

  if tarfile.directory?
    FileUtils.mkdir_p destination_file
  else
    destination_directory = File.dirname(destination_file)
    FileUtils.mkdir_p destination_directory unless File.directory?(destination_directory)
    File.open destination_file, "wb" do |f|
      f.print tarfile.read
    end
  end
end

end
end

untar('/etc/td-agent/config/scripts', '/etc/td-agent/config/scripts/content.tar.gz')

I am using untar method in my project, but i am getting below error, can you please help on that.

9: from /usr/local/rvm/rubies/ruby-2.7.0/bin/irb:23:in <main>' 8: from /usr/local/rvm/rubies/ruby-2.7.0/bin/irb:23:in load'
7: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/gems/2.7.0/gems/irb-1.2.1/exe/irb:11:in <top (required)>' 6: from (irb):26 5: from (irb):27:in rescue in irb_binding'
4: from (irb):5:in untar' 3: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/package/tar_reader.rb:24:in new'
2: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/package/tar_reader.rb:24:in new' 1: from /usr/local/rvm/rubies/ruby-2.7.0/lib/ruby/2.7.0/rubygems/package/tar_reader.rb:43:in initialize'
NoMethodError (undefined method `pos' for "/etc/td-agent/config/scripts":String)

@ianfixes
Copy link

This gist was a great example to get me started with gar/gzip in ruby... but this part didn't feel completely idiomatic to me:

    # un-gzips the given IO, returning the
    # decompressed version as a StringIO
    def ungzip(tarfile)
      z = Zlib::GzipReader.new(tarfile)
      unzipped = StringIO.new(z.read)
      z.close
      unzipped
    end

After a lot of digging, it looks like the block-based form of this would be

Zlib::GzipReader.wrap(tarfile) do |z|
  # do something with z, which is a Zlib::GzipReader that automatically closes afterward.
  # or if you really REALLY need a StringIO
  unzipped = StringIO.new(z.read)   # basically trade one type of IO-compatible object for another one
end 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment