Skip to content

Instantly share code, notes, and snippets.

@mpalet
Last active January 4, 2020 19:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mpalet/fb31428df4f59616d0ea20f0e18bbbc4 to your computer and use it in GitHub Desktop.
Save mpalet/fb31428df4f59616d0ea20f0e18bbbc4 to your computer and use it in GitHub Desktop.
Remove unicode diacritics

Removes UTF-8 NFD diacritics (osx) in path recursively replacing them for the simple ascii transliteration. Eg: À > A

find . -print | ggrep -P "[\x{00FF}-\x{FFFF}]" | while read f; do
  newfile="$(echo $f | sed -e 's/À/A/g;s/Á/A/g;s/Ä/A/g;s/à/a/g;s/á/a/g;s/ä/a/g;s/È/E/g;s/É/E/g;s/Ë/E/g;s/è/e/g;s/é/e/g;s/ë/e/g;s/Ì/I/g;s/Í/I/g;s/Ï/I/g;s/ì/i/g;s/í/i/g;s/ï/i/g;s/Ò/O/g;s/Ö/O/g;s/Ó/O/g;s/ò/o/g;s/ó/o/g;s/ö/o/g;s/Ù/U/g;s/Ú/U/g;s/Ü/U/g;s/ù/u/g;s/ú/u/g;s/ü/u/g' | iconv -f utf-8-mac -t ascii//TRANSLIT)"
  mv -v "$f" "$newfile"
done

Removes normalized UTF-8 diacritics (unix) in path recursively replacing them for the simple ascii transliteration. Eg: À > A

find . -print | ggrep -P "[\x80-\xFF]" | while read f; do
  newfile="$(echo $f | sed -e 's/À/A/g;s/Á/A/g;s/Ä/A/g;s/à/a/g;s/á/a/g;s/ä/a/g;s/È/E/g;s/É/E/g;s/Ë/E/g;s/è/e/g;s/é/e/g;s/ë/e/g;s/Ì/I/g;s/Í/I/g;s/Ï/I/g;s/ì/i/g;s/í/i/g;s/ï/i/g;s/Ò/O/g;s/Ö/O/g;s/Ó/O/g;s/ò/o/g;s/ó/o/g;s/ö/o/g;s/Ù/U/g;s/Ú/U/g;s/Ü/U/g;s/ù/u/g;s/ú/u/g;s/ü/u/g' | iconv -f utf-8 -t ascii//TRANSLIT)"
  mv -v "$f" "$newfile"
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment