Skip to content

Instantly share code, notes, and snippets.

@leonardo-fernandes
Last active June 5, 2016 12:02
Show Gist options
  • Save leonardo-fernandes/82463c269b596c33c7b880c68c245e24 to your computer and use it in GitHub Desktop.
Save leonardo-fernandes/82463c269b596c33c7b880c68c245e24 to your computer and use it in GitHub Desktop.
Detect plagiarism in Word documents containing images
find . -type f -print0 | while IFS= read -r -d '' file; do printf "%s ($file)\n" "$(unzip -p "$file" | egrep -o '(left|right|cx|cy)="[0-9]+"|<xdr:(col|colOff|row|rowOff)>[0-9]+</xdr:(col|colOff|row|rowOff)>' | md5sum)" ; done | awk '{ if (assoc[$1] && assoc[$1] != 1) { print assoc[$1]; assoc[$1] = 1; } if (assoc[$1] && assoc[$1] == 1) { print $0; } if (!assoc[$1]) { assoc[$1] = $0; } }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment