Skip to content

Instantly share code, notes, and snippets.

@dgtlmoon
Last active August 19, 2020 21:33
Show Gist options
  • Save dgtlmoon/c1b1805bcb20f3621c0e29b8a244e8d9 to your computer and use it in GitHub Desktop.
Save dgtlmoon/c1b1805bcb20f3621c0e29b8a244e8d9 to your computer and use it in GitHub Desktop.

Some machine learning image cleanup tricks

Print files larger than some size

identify -format '%f|%w|%h\n' *|awk -F\| ' $3> 512 || $3 > 512'

856x642-manual-0fc7abf38681f7195e0588643037a7a7.jpg|856|642
856x642-manual-4bcae3ee8401e05c1da869e2ce0f154f.jpg|856|642
856x642-manual-5b9e571e0a4638cee70a92865d97bedd.jpg|856|642
856x642-manual-79f75f23bff3a059da0d9e0b7b1dea3b.jpg|856|642
856x642-manual-aa3799d09661d87ad05519c32a9c1cd7.jpg|856|642

Remove list of files that are too wide or too high from a main list

For example, when you have a large training file that is img.jpg, bbox.txt

identify -format '%f|%w|%h\n' *|awk -F\| ' $2> 512 || $3 > 512 {print $1}' > large-list.txt 
grep -v -F -f large-list.txt 1-images.txt

Scale the image and bbox by half

convert -resize 50% image.jpg
cat bbox.txt awk '{printf ("%i %i %i %i %i\n", $1,$2/2+1,$3/2+1,$4/2+1, $5/2+1)}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment