Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Download unaligned celeba (in the wild) in .tgz archive which is significantly faster to extract than .7z

Celeba dataset as explained here:

http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

  • img_celeba.gz: contains unaligned "in the Wild" images
    • originally, the dataset was released as a .7z archive splitted into 14 subfiles (.7z.001 ... .7z.014).
    • the problem is that unpacking .7z on linux is not parallelized on linux (https://unix.stackexchange.com/questions/210671/7-zip-slows-down-over-time-on-ubuntu-but-not-windows)
    • it is therefore significantly faster to
      1. download the files
      2. extract them on windows
      3. compress them with a different format
      4. move them to the server and then uncompress again, using a parallelized compression algorithm
    • In order to prevent others from this hazzle, I host the .tgz compressed file here.
    • use the following commands to download and extract
    wget http://datasets.sandrobraun.de/celeba/img_celeba.tgz
    pigz -dc img_celeba.tgz | tar xf -
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment