Skip to content

Instantly share code, notes, and snippets.

@jordansissel
Last active May 28, 2022 06:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jordansissel/8910dfaa521b9b1438fd967b3b1fe5b5 to your computer and use it in GitHub Desktop.
Save jordansissel/8910dfaa521b9b1438fd967b3b1fe5b5 to your computer and use it in GitHub Desktop.
Reading nested tar files.

What?

Sometimes tar files have tar files, and it might be nice to read the contents of some files several layers deep. Here's how we can do it with Ruby and Python3 with no external dependencies.

Podman save

We can create a layered tar file (a tar file containing other tar files) using docker save or podman save

% podman save fpm-ubuntu-18.04 -o test.tar

Read with Ruby

% ruby read.rb | head
[65528320] fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [1113504] bin/bash
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [34888] bin/bunzip2
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzcat
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzcmp
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [2140] bin/bzdiff
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzegrep
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [4877] bin/bzexe
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzfgrep

Read with Python

% python3 read.py | head
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [1113504] bin/bash
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [34888] bin/bunzip2
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzcat
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzcmp
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [2140] bin/bzdiff
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzegrep
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [4877] bin/bzexe
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [0] bin/bzfgrep
fe869b35292eb00f40676359c02b2c94a01c1315ad91255621a52bc65987d374.tar :: [3642] bin/bzgrep
import tarfile
tar = tarfile.open("test.tar")
for entry in tar:
if not entry.name.endswith(".tar"):
continue
if entry.size == 0:
continue
# Got an inner tar file, let's extract.
inner = tar.extractfile(entry)
# The key here is passing "inner" (An io.BufferedReader) as a kwarg "fileobj=..."
innertar = tarfile.open(fileobj=inner)
for innerentry in innertar:
print("{} :: [{}] {}".format(entry.name, innerentry.size, innerentry.name))
require "rubygems/package"
require "stringio"
Gem::Package::TarReader.new(File.new("test.tar")).each_entry do |entry|
puts "[#{entry.size}] #{entry.full_name}"
next unless entry.full_name =~ /\.tar$/
# Skip empty "/layer.tar" files
next if entry.size == 0
# Ideally, 'entry' would be a Ruby IO for further reading, but... the IO
# module doesn't really have a means for wrapping IO-like objects.
data = entry.read
# Read the "inner" tar file and print the "[size] path"
Gem::Package::TarReader.new(StringIO.new(data)).each_entry do |layerentry|
puts "#{entry.full_name} :: [#{layerentry.size}] #{layerentry.full_name}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment