Skip to content

Instantly share code, notes, and snippets.

@oneohthree
Last active February 22, 2024 01:53
Show Gist options
  • Star 68 You must be signed in to star a gist
  • Fork 8 You must be signed in to fork a gist
  • Save oneohthree/f528c7ae1e701ad990e6 to your computer and use it in GitHub Desktop.
Save oneohthree/f528c7ae1e701ad990e6 to your computer and use it in GitHub Desktop.
Quick bash slugify
echo "$STRING" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z
@pjboro
Copy link

pjboro commented May 10, 2022

Best to replace tr A-Z a-z at the end by tr "[:upper:]" "[:lower:]" to support accentuated characters like É f.i.

These characters are handled by iconv. I thought, were they not, they would be handled by sed replace, but at least in GNU sed 4.8 most of them belongs to a-z range.

╰─➤  echo É | iconv -t ascii//TRANSLIT                                                                     
E
# not every diacritic is contained in a-z
╰─➤  echo "ā, ä, ǟ, ḑ, ē, ī, ļ, ņ, ō, ȯ, ȱ, õ, ȭ, ŗ, š, ț, ū, ž." | sed -r 's/[^a-zA-Z0-9]+/-/g' | sed -r 's/^-+\|-+$//g' | tr A-Z a-z                  130 ↵
ā-ä-ǟ-ḑ-ē-ī-ļ-ņ-ō-ȯ-ȱ-õ-ȭ-ŗ-š-ț-ū-

@janosgyerik
Copy link

janosgyerik commented Nov 9, 2022

It's good to replace multiple sed processes with a single one using multiple -e parameters.

It's good to use [:alnum:] instead of [^a-zA-Z0-9].

It's good to use tr "[:upper:]" "[:lower:]" instead of tr A-Z a-z as a matter of principle for the goal of lowercasing input. To know that tr A-Z a-z is good enough requires verifying what comes before in the pipeline, and knowing how iconv works. That's added mental burden.

Putting it together:

iconv -t ascii//TRANSLIT | sed -E -e 's/[^[:alnum:]]+/-/g' -e 's/^-+|-+$//g' | tr '[:upper:]' '[:lower:]'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment