echo "$STRING" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z |
kissu
commented
May 8, 2019
Best to replace tr A-Z a-z
at the end by tr "[:upper:]" "[:lower:]"
to support accentuated characters like É
f.i.
Best to replace
tr A-Z a-z
at the end bytr "[:upper:]" "[:lower:]"
to support accentuated characters likeÉ
f.i.
These characters are handled by iconv. I thought, were they not, they would be handled by sed replace, but at least in GNU sed 4.8 most of them belongs to a-z range.
╰─➤ echo É | iconv -t ascii//TRANSLIT
E
# not every diacritic is contained in a-z
╰─➤ echo "ā, ä, ǟ, ḑ, ē, ī, ļ, ņ, ō, ȯ, ȱ, õ, ȭ, ŗ, š, ț, ū, ž." | sed -r 's/[^a-zA-Z0-9]+/-/g' | sed -r 's/^-+\|-+$//g' | tr A-Z a-z 130 ↵
ā-ä-ǟ-ḑ-ē-ī-ļ-ņ-ō-ȯ-ȱ-õ-ȭ-ŗ-š-ț-ū-
It's good to replace multiple sed
processes with a single one using multiple -e
parameters.
It's good to use [:alnum:]
instead of [^a-zA-Z0-9]
.
It's good to use tr "[:upper:]" "[:lower:]"
instead of tr A-Z a-z
as a matter of principle for the goal of lowercasing input. To know that tr A-Z a-z
is good enough requires verifying what comes before in the pipeline, and knowing how iconv
works. That's added mental burden.
Putting it together:
iconv -t ascii//TRANSLIT | sed -E -e 's/[^[:alnum:]]+/-/g' -e 's/^-+|-+$//g' | tr '[:upper:]' '[:lower:]'