Skip to content

Instantly share code, notes, and snippets.

@cwvhogue
cwvhogue / plot_image_dist_R.R
Last active December 23, 2015 18:19
Plots Image size distribution after ImageMagic identify MapReduce is converted to .csv and loaded in.
getty_sizes<-read.csv(header=FALSE, "Getty_Filesizes.csv")
hist(getty_sizes[,1],breaks=1000, main="Getty Open Image Size at 0.25 Megapixel", xlab="KB")
rug(getty_sizes[,1])
@cwvhogue
cwvhogue / one_machine_reduce.sh
Last active December 23, 2015 18:19
One machine Unix reduce phase for ImageMagic identify analysis MapReduce
find . -name '*.id' | xargs -I {} cat {} > mr_identify.txt
@cwvhogue
cwvhogue / one_machine_map.sh
Last active December 23, 2015 18:19
One machine image analysis map phase for MapReduce ImageMagick on Unix command line
find . -name '*.jpg' -exec sh -c 'identify "{}" > `basename "{}" .jpg`.id' ';'
@cwvhogue
cwvhogue / manta_identify_image.sh
Last active December 23, 2015 18:19
Manta MapReduce - extract image filesize, dimensions, filename with ImageMagick identify command:
mfind /$MANTA_USER/public/getty -n "jpg$" | mjob create -w -m 'identify $MANTA_INPUT_FILE' -r cat
@cwvhogue
cwvhogue / getty_muntar.sh
Last active December 23, 2015 18:19
Extract the getty.tar with muntar
echo /$MANTA_USER/stor/getty.tar | mjob create -o -m 'muntar -f $MANTA_INPUT_FILE /$MANTA_USER/public'
@cwvhogue
cwvhogue / curl_misnamed_jpg_tar.sh
Last active December 23, 2015 18:19
Getting the webp image set from Joyent Manta. Note these are JPEG encoded with webp extensions for demo purpose.
curl -k https://us-east.manta.joyent.com/mantademo/public/images/getty-open/130812_GettyOpen_500x500_webp.tar > getty_webp.tar
@cwvhogue
cwvhogue / webp_from_jpg
Created September 23, 2013 18:23
ImageMagick convert options to resize a large JPG file to WebP at .25 Megapixel. WORKS ONLY IF libwebp is linked in to convert, otherwise outputs .webp that is really JPEG encoded...
convert 00000201.jpg -colorspace RGB -resize 250000@ -colorspace sRGB -quality 50 -define webp:lossless=true 00000201.webp
@cwvhogue
cwvhogue / gist:6674659
Created September 23, 2013 18:18
Rename a directory full of *.webp to *.jpg on Unix with find and -exec
find . -name '*.webp' -exec sh -c 'mv "{}" ``dirname "{}"``/``basename "{}" webp``jpg' ';'
@cwvhogue
cwvhogue / ImageMagick_identify_Manta_local_diff_validation
Last active December 21, 2015 18:29
Detailed image data set validation - diff-ing Manta ImageMagick 'identify' output with local copy.
# Start with the MapReduce version of ImageMagick 'identify' output from previous Gist
# With your local image directory (assume these are the good originals) run 'identify' as follows
$ identify *.jpg > master_identify.txt
# to match the MapReduce output, local job specific information needs to be removed.
$ cat master_identify.txt | \
sed 's/\[\(.*\)\]//' | \
sed 's/ \(.\):\(..\).\(...\)//' | \
sed 's/ 0.\(...\)u//' | \
@cwvhogue
cwvhogue / ImageMagick_identify_output_to_R_histogram
Last active December 21, 2015 18:28
Plot distribution of image file sizes using ImageMagick 'identify' command - default output. 1. Reorganize the 'identify' output into a sorted .csv file. 2. Look for and remove broken images, regenerate .csv file with clean image set. 3. Plot histogram of image size distribution with R.
# image_identify.txt is a file with the default output from ImageMagick 'identify'
# run over a set of JPG files locally,
# or the equivalent Manta MapReduce 'identify' output from the previous Gist.
$ identify *.jpg > image_identify.txt
$ head -5 image_identify.txt
00000201.jpg JPEG 3295x5947 3295x5947+0+0 8-bit sRGB 23.99MB 0.010u 0:00.000
00000301.jpg[1] JPEG 4470x3126 4470x3126+0+0 8-bit sRGB 22.15MB 0.000u 0:00.009
00000401.jpg[2] JPEG 3115x4485 3115x4485+0+0 8-bit sRGB 19.41MB 0.000u 0:00.000
00000501.jpg[3] JPEG 3093x4515 3093x4515+0+0 8-bit sRGB 19.39MB 0.000u 0:00.000