Some machine learning image cleanup tricks
Print files larger than some size
identify -format '%f|%w|%h\n' *|awk -F\| ' $3> 512 || $3 > 512'
856x642-manual-0fc7abf38681f7195e0588643037a7a7.jpg|856|642
856x642-manual-4bcae3ee8401e05c1da869e2ce0f154f.jpg|856|642
856x642-manual-5b9e571e0a4638cee70a92865d97bedd.jpg|856|642
856x642-manual-79f75f23bff3a059da0d9e0b7b1dea3b.jpg|856|642
856x642-manual-aa3799d09661d87ad05519c32a9c1cd7.jpg|856|642
Remove list of files that are too wide or too high from a main list
For example, when you have a large training file that is img.jpg, bbox.txt
identify -format '%f|%w|%h\n' *|awk -F\| ' $2> 512 || $3 > 512 {print $1}' > large-list.txt
grep -v -F -f large-list.txt 1-images.txt
Scale the image and bbox by half
convert -resize 50% image.jpg
cat bbox.txt awk '{printf ("%i %i %i %i %i\n", $1,$2/2+1,$3/2+1,$4/2+1, $5/2+1)}'