theRealSuperMario/notes.md

## notes.md

      
    Raw
  

              notes.md
            
          
    Celeba dataset as explained here:
http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

img_celeba.gz: contains unaligned "in the Wild" images

originally, the dataset was released as a .7z archive splitted into 14 subfiles (.7z.001 ... .7z.014).
the problem is that unpacking .7z on linux is not parallelized on linux (https://unix.stackexchange.com/questions/210671/7-zip-slows-down-over-time-on-ubuntu-but-not-windows)
it is therefore significantly faster to

download the files
extract them on windows
compress them with a different format
move them to the server and then uncompress again, using a parallelized compression algorithm


In order to prevent others from this hazzle, I host the .tgz compressed file here.
use the following commands to download and extract

wget http://datasets.sandrobraun.de/celeba/img_celeba.tgz
pigz -dc img_celeba.tgz | tar xf -