Skip to content

Instantly share code, notes, and snippets.

@gvanhorn38
Last active November 11, 2019 19:28
Show Gist options
  • Save gvanhorn38/c15b2ca795aabf25ef2cd1248dc1eafd to your computer and use it in GitHub Desktop.
Save gvanhorn38/c15b2ca795aabf25ef2cd1248dc1eafd to your computer and use it in GitHub Desktop.
Image pre-processing.

You can install imagemagick with:

sudo apt-get install imagemagick

We first need to identify potential problems:

#!/bin/bash

image=$1

identify +ping "$image" &> /dev/null; 
if [ $? -ne 0 ]; then echo "$image"; fi

This can be run with the parallel:

$ ls images/ | parallel './check2.sh images/{}' > bad_images.txt

We can create a url file of the images that we should redownload:

$ cat bad_images.txt | xargs -n 1 -i grep {} urls.txt > redo_urls.txt

We can then try to redownload the images (see here)

Once the problem images have been removed (e.g. cat bad_images.txt | xargs -n 1 rm) or fixed, we can convert them:

$ ls images | grep \.jpg | parallel -j 10 "mogrify -format jpg images/{}"

Perhaps we also want to resize them:

$ ls images/ | parallel -j 4 "mogrify -layers flatten -resize 800x800 -format jpg images/{}"

We can also use convert:

$ ls images/ | grep \.jpg | parallel -j 6 "convert images/{} -format jpg -resize 800x800 resized_images/{}"

If we want to save off the images sizes, we can do the following:

HEADS UP: sometimes a newline is necessary, and sometimes not...

$ ls -U images/ | parallel -j6 'identify -format "%f %h %w" images/{}' > image_sizes.txt

Or we can use xargs:

$ find images -type f | xargs -P 24 identify -format "%d/%f %h %w\n" > image_sizes2.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment