Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Use pure C# to extract .tar and .tar.gz files
using System;
using System.IO;
using System.IO.Compression;
using System.Text;
namespace TarExample
{
public class Tar
{
/// <summary>
/// Extracts a <i>.tar.gz</i> archive to the specified directory.
/// </summary>
/// <param name="filename">The <i>.tar.gz</i> to decompress and extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTarGz(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTarGz(stream, outputDir);
}
/// <summary>
/// Extracts a <i>.tar.gz</i> archive stream to the specified directory.
/// </summary>
/// <param name="stream">The <i>.tar.gz</i> to decompress and extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTarGz(Stream stream, string outputDir)
{
// A GZipStream is not seekable, so copy it first to a MemoryStream
using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
{
const int chunk = 4096;
using (var memStr = new MemoryStream())
{
int read;
var buffer = new byte[chunk];
do
{
read = gzip.Read(buffer, 0, chunk);
memStr.Write(buffer, 0, read);
} while (read == chunk);
memStr.Seek(0, SeekOrigin.Begin);
ExtractTar(memStr, outputDir);
}
}
}
/// <summary>
/// Extractes a <c>tar</c> archive to the specified directory.
/// </summary>
/// <param name="filename">The <i>.tar</i> to extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTar(string filename, string outputDir)
{
using (var stream = File.OpenRead(filename))
ExtractTar(stream, outputDir);
}
/// <summary>
/// Extractes a <c>tar</c> archive to the specified directory.
/// </summary>
/// <param name="stream">The <i>.tar</i> to extract.</param>
/// <param name="outputDir">Output directory to write the files.</param>
public static void ExtractTar(Stream stream, string outputDir)
{
var buffer = new byte[100];
while (true)
{
stream.Read(buffer, 0, 100);
var name = Encoding.ASCII.GetString(buffer).Trim('\0');
if (String.IsNullOrWhiteSpace(name))
break;
stream.Seek(24, SeekOrigin.Current);
stream.Read(buffer, 0, 12);
var size = Convert.ToInt64(Encoding.UTF8.GetString(buffer, 0, 12).Trim('\0').Trim(), 8);
stream.Seek(376L, SeekOrigin.Current);
var output = Path.Combine(outputDir, name);
if (!Directory.Exists(Path.GetDirectoryName(output)))
Directory.CreateDirectory(Path.GetDirectoryName(output));
if (!name.Equals("./", StringComparison.InvariantCulture))
{
using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
{
var buf = new byte[size];
stream.Read(buf, 0, buf.Length);
str.Write(buf, 0, buf.Length);
}
}
var pos = stream.Position;
var offset = 512 - (pos % 512);
if (offset == 512)
offset = 0;
stream.Seek(offset, SeekOrigin.Current);
}
}
}
}
@rickx

This comment has been minimized.

Copy link

@rickx rickx commented Aug 23, 2019

Hi,
thanks for this code!
I had to change a couple of things to make it work actually:

  1. var size = Convert.ToInt64(Encoding.ASCII.GetString(buffer, 0, 12).Trim(), 8); would give error still having the trailing \0 so I inserted that trim
    var size = Convert.ToInt64(Encoding.UTF8.GetString(buffer, 0, 12).Trim('\0').Trim(), 8); //I changed the encoding as a first attempt but I guess that's not important
  2. if you use tar-gzs which don't have a parent dir (you want to have the files extracted to the current dir, no subdir) then your code gives error here: using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write))
    so I added an if to jump over the "./" file.
    if (!name.Equals("./", StringComparison.InvariantCulture)) { using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write)) { ...
@ForeverZer0

This comment has been minimized.

Copy link
Owner Author

@ForeverZer0 ForeverZer0 commented Aug 23, 2019

@rickx
Good observations, I updated the code to reflect those changes, thank you very much!

I admittedly never tested this code very thoroughly, it was something I threw together for a StackOverflow answer as an alternative to including an entire third-party library in your project just to extract a tarball. I would definitely recommended some more robust testing and scenarios before using it in any production code.

Thanks again for your contribution!

@zhangzhezh

This comment has been minimized.

Copy link

@zhangzhezh zhangzhezh commented Aug 30, 2019

Please Check Line 76

@voxsoftware

This comment has been minimized.

Copy link

@voxsoftware voxsoftware commented Oct 3, 2019

Hi,

  1. Thanks for this script.
  2. your cs file need a modification. In line 83. Instead of make this:
if (!name.Equals("./", StringComparison.InvariantCulture)) 

Please make this:

if(!name.EndsWith("/"))

I had tested on many files, and your original line fails, because you are trying to read directories as a file. Please make the change.

I also change the line 72:

if (String.IsNullOrWhiteSpace(name))

for

if(name != null && name!="")

This enables this file working also on NET Framework 2.0.
If you check my forked file, you can copy the changes, and I also normalized the tabs/whitespaces

@VikyaSurve

This comment has been minimized.

Copy link

@VikyaSurve VikyaSurve commented Oct 31, 2019

I have tar file with structure like. i am not able to parse tar. Can anyone please help me to do that?

  1. Tar File
    a) Folder
    I) csv1
    II) csv2
    III) csv3 etc.
@sreed4

This comment has been minimized.

Copy link

@sreed4 sreed4 commented Mar 14, 2020

Semicolon missing on end of line 76

@Mpprobst

This comment has been minimized.

Copy link

@Mpprobst Mpprobst commented May 6, 2020

Do you also know how to compress to a tar.gz? I've been trying to reverse your process to do so but have had no luck.

@ForeverZer0

This comment has been minimized.

Copy link
Owner Author

@ForeverZer0 ForeverZer0 commented May 8, 2020

@Mpprobst
This is a very quick and dirty method of extraction, simply using offsets defined by the spec to grab the few pieces of information needed to extract the file's data from the stream. A full and proper implementation would define some structs, checksums, etc.

The TAR spec is actually fairly simple, it essentially just copies all of the input into a single stream, prepending each input with a basic header that it ensures begins on a specific byte boundary in the stream. In order for any other tool to be able to read the output, your would need to do a proper implementation and ensure the entire header is valid.

Luckily, the compression aspect of it using a GZip stream is already built-in to .NET, so it won't require anything complicated to convert a tar archive into a tar.gz one.

@Su-s

This comment has been minimized.

Copy link

@Su-s Su-s commented Nov 2, 2020

getting exception in line 40 when trying to decompress large files of 100MB

40 memStr.Write(buffer, 0, read);

image

image

@Su-s

This comment has been minimized.

Copy link

@Su-s Su-s commented Nov 3, 2020

getting exception in line 40 when trying to decompress large files of 100MB

40 memStr.Write(buffer, 0, read);

image

image

issue fixed https://gist.github.com/Su-s/438be493ae692318c73e30367cbc5c2a

@samuel7cunha

This comment has been minimized.

Copy link

@samuel7cunha samuel7cunha commented Dec 8, 2020

ExtractTar use memory blocks for reading and writing. My tests showed 45% performance.

const int chunk = 2 * 1024 * 1024; //2MB var fbuf = new byte[chunk];

using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write)) { int fbalance = size; int fread, fcount; while (true) { fcount = (fbuf.Length <= fbalance) ? fbuf.Length : fbalance; fread = stream.Read(fbuf, 0, fcount); if (fread <= 0) break; str.Write(fbuf, 0, fread); fbalance -= fread; } }

@TheDotSource

This comment has been minimized.

Copy link

@TheDotSource TheDotSource commented May 20, 2021

Excellent, just what I was looking for. I have wrapped this into a PowerShell function:

https://github.com/TheDotSource/Expand-TarBall/blob/main/Expand-TarBall.ps1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment