Celeba dataset as explained here:
- img_celeba.gz: contains unaligned "in the Wild" images
- originally, the dataset was released as a .7z archive splitted into 14 subfiles (.7z.001 ... .7z.014).
- the problem is that unpacking .7z on linux is not parallelized on linux (https://unix.stackexchange.com/questions/210671/7-zip-slows-down-over-time-on-ubuntu-but-not-windows)
- it is therefore significantly faster to
- download the files
- extract them on windows
- compress them with a different format
- move them to the server and then uncompress again, using a parallelized compression algorithm
- In order to prevent others from this hazzle, I host the .tgz compressed file here.
- use the following commands to download and extract
wget http://datasets.sandrobraun.de/celeba/img_celeba.tgz pigz -dc img_celeba.tgz | tar xf -