Skip to content

Instantly share code, notes, and snippets.

@ThomasG77
Last active April 27, 2022 00:05
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ThomasG77/5971236 to your computer and use it in GitHub Desktop.
Save ThomasG77/5971236 to your computer and use it in GitHub Desktop.
How to deal with renaming invalid UTF8 characters in directories or files
## Reminder to deal with renaming invalid UTF8 characters (if you are using latin-1 also called iso-8859-1)
## Sources: http://unix.stackexchange.com/questions/6460/bulk-rename-or-correctly-display-files-with-special-characters
## To use it, do a chmod +x rename_invalid_characters_linux.sh; and after do ./rename_invalid_characters_linux.sh your_dir_where_you_want_to_scan_and_rename
# First command to add
grep-invalid-utf8 () {
perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print'
}
# Find invalid characters on utf-8 side
find | grep-invalid-utf8
# Check if latin1 using recode or iconv (you can do the same with your own encoding)
# find | grep-invalid-utf8 | recode latin1..utf8
# find | grep-invalid-utf8 | iconv -f latin1 -t utf8
# Rename using perl
find | grep-invalid-utf8 |
rename 'BEGIN {binmode STDIN, ":encoding(latin1)"; use Encode;}
$_=encode("utf8", $_)'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment