Skip to content

Instantly share code, notes, and snippets.

@chabala
Last active August 10, 2024 00:49
Show Gist options
  • Save chabala/22ed01d7acf9ee0de9e3d867133f83fb to your computer and use it in GitHub Desktop.
Save chabala/22ed01d7acf9ee0de9e3d867133f83fb to your computer and use it in GitHub Desktop.
Merge and extract tgz files from Google Takeout

Recently found some clowny gist was the top result for 'google takeout multiple tgz', where it was using two bash scripts to extract all the tgz files and then merge them together. Don't do that. Use brace expansion, cat the TGZs, and extract:

$ cat takeout-20201023T123551Z-{001..011}.tgz | tar xzivf -

You don't even need to use brace expansion. Globbing will order the files numerically:

$ cat takeout-20201023T123551Z-*.tgz | tar xzivf -

tar has been around forever, they didn't design it to need custom scripts to deal with multipart archives. Since it's extracting the combined archive, there's no 'mess of partial directories' to be merged. It just works, as intended.

An additional tip, courtesy of Dmitriy Otstavnov (@bvc3at): if you have pv available, you can track the progress of the extraction:

> pv takeout-* | tar xzif -
 190GiB 2:37:54 [18.9MiB/s] [==============>                                   ] 30% ETA 5:03:49
@shawnhank
Copy link

This worked very well for a 6TB Google Photos Takeout data set consisting of 22 50GB TGZ files. Ran it via Ubuntu on Windows (WSL) and the file total matched the export.

Proof positive that tar does indeed work!

@Jake-Je0n
Copy link

This really worked well. Thanks!

@chrishop
Copy link

chrishop commented Aug 3, 2022

You're now the top result for "how to join .tgz google takeout"
Anyway it's just what I needed, cheers!

@chabala
Copy link
Author

chabala commented Aug 3, 2022

You're now the top result for "how to join .tgz google takeout"

Mission accomplished then. 😆 Had to displace the bad results with better results.

@justinhartman
Copy link

I mean I get that you have a better solution for tgz archives but the original author of the clowny gist created a solution that a) worked with zips and b) worked for whatever use case he had and felt he wanted to share it. I'm not sure it's fair to dismiss his result just because you don't like it. NB: I use your solution, I just don't like the dismissiveness of someone creating and sharing a solution, no matter how inefficienct it appears.

@chabala
Copy link
Author

chabala commented Aug 12, 2022

  1. The scripts are also pointless for zip files, which have a similar one liner. At the time, zip downloads from takeout were limited to 2GB per file, versus 50GB per tgz file, so using zip was already a poor choice.
  2. If someone makes a group of scripts that replicate the basic function of unix commands, badly, and people find them and use them because they're the best search result, the world becomes a dumber place. That deserves some slight ridicule. The original gist author removed the gist; only forks of it persist.

@ariccio
Copy link

ariccio commented Aug 27, 2022

Protip for MacOS users: use gnu tar/gtar, not the built-in bsdtar. This just gave me a few days of headaches! It also appears that the built in archive utility won't correctly extract these files if they've first been cated.

@bvc3at
Copy link

bvc3at commented Sep 18, 2022

Using pv instead of cat can help tracking progress of extracting archives:

> pv takeout-* | tar xzif -
 190GiB 2:37:54 [18.9MiB/s] [==============>                                   ] 30% ETA 5:03:49

@chabala
Copy link
Author

chabala commented Sep 18, 2022

This is a useful enhancement, though I note I didn't have pv installed by default in Ubuntu, so it's perhaps less portable. I'll add it regardless.

@sagz
Copy link

sagz commented Oct 20, 2022

On MacOS (Mojave or Ventura+), there's a small mod:

pv takeout-* | gtar -xzif -

(if you don't have pv, then install with Homebrew: brew install pv. Also, MacOS uses bsdtar which doesn't support -i ignore zeroes, so use gnutar; installable with brew install gnu-tar)

@chabala
Copy link
Author

chabala commented Jun 1, 2023

What's the best option for macOS if I went with the zip option?

Re-request the Takeout with tar.gz. Otherwise, Stack Overflow: https://unix.stackexchange.com/a/40565/19124

@paradite
Copy link

paradite commented Jul 5, 2023

Just a note for those who exported zip on macOS. You can merge the zip files simply with:

unzip \*.zip [Return]

(backslash is to escape * which would be expanded by default)

source: https://forums.macrumors.com/threads/unzip-multiple-files-into-one-folder.1645539/

@johannesunana
Copy link

johannesunana commented Jul 17, 2023

Thank you @chabala and @bvc3at! This worked for me on Windows 10 using Ubuntu on WSL. Glad I saw this as the top result.

@chabala
Copy link
Author

chabala commented Oct 25, 2023

gist comments are not a Q&A forum. If you have unix questions, ask them on StackOverflow.

@dturner
Copy link

dturner commented Apr 25, 2024

@paradite Worked perfectly for zip files on MacOS - thank you!

@Thinkscape
Copy link

The gnu-tar doesn't seem to support -i shorthand, needs the full --ignore-zeros to work.

pv takeout-* | tar --ignore-zeros -xzf -

@chabala
Copy link
Author

chabala commented May 24, 2024

gnu-tar doesn't seem to support -i shorthand, needs the full --ignore-zeros to work.

@Thinkscape are you sure you're using GNU tar? Because that's the same complaint made earlier about bsdtar, where installing GNU tar was the solution.

@Thinkscape
Copy link

gnu-tar doesn't seem to support -i shorthand, needs the full --ignore-zeros to work.

@Thinkscape are you sure you're using GNU tar? Because that's the same complaint made earlier about bsdtar, where installing GNU tar was the solution.

My bad. homebrew linking mess got me this time ...

@JohnGemstone
Copy link

Thanks @paradite unzip \*.zip is all we need for zip archives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment