Skip to content

Instantly share code, notes, and snippets.

@tiagoad
Created May 20, 2017 19:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tiagoad/04ad1b4fd2255b49bbe33f2a9696fe09 to your computer and use it in GitHub Desktop.
Save tiagoad/04ad1b4fd2255b49bbe33f2a9696fe09 to your computer and use it in GitHub Desktop.
Replaces unicode characters with ASCII-safe alternatives
#!/bin/bash
# findreplace
# ===========
# Replaces unicode characters with ASCII-safe alternatives
# Single-letter words á, à and é are replaced with a', 'a and e'
## check commands
SED=sed
if gsed --version >/dev/null 2>&1 ; then
SED=gsed
elif ! sed --version >/dev/null 2>&1 ; then
echo "GNU sed is required."
exit 0
fi
## usage info
if [[ $# -eq 0 ]] ; then
cat << EOF
Usage:
$0 <find arguments>
Example:
$0 . -name '*.java'
EOF
exit 0
fi
## regex
read -r -d '' REGEX << EOF
s/\bà\b/\'a/g;
s/\bá\b/a\'/g;
s/\bé\b/e\'/g;
s/[áàâã]/a/g;
s/[éê]/e/g;
s/í/i/g;
s/[óôõ]/o/g;
s/ú/u/g;
s/ç/c/g;
EOF
# upper case
REGEX=$REGEX$(echo $REGEX | $SED "s/[^sgb]*/\U&/g")
## parallel commands
read -r -d '' COMMANDS << EOF
echo "{}"
$SED -i "$REGEX" "{}"
$SED '/\([\d000-\d127]\)/d' "{}"
EOF
## run
find "$@" -print0 | parallel --will-cite -0 "$COMMANDS"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment