oneohthree/quick-slugify.sh

Last active February 22, 2024 01:53

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/oneohthree/f528c7ae1e701ad990e6.js"></script>
Save oneohthree/f528c7ae1e701ad990e6 to your computer and use it in GitHub Desktop.

Download ZIP

Quick bash slugify

Raw

quick-slugify.sh

echo "$STRING" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z

kylemisner commented Mar 15, 2019

When I run

echo "Esperança do vôo do avião" | iconv -t ascii//TRANSLIT | sed -r s/[~\^]+//g | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z

I get the expected output: esperanca-do-voo-do-aviao

So I don't see why sed -E 's/[~\^]+//g' is necessary.

Also, the original code handles removal of all trailing hyphens with this regex sed -E 's/^-+|-+$//g'
In English, this regex finds one or more hyphens at the start or one or more hyphens at the end then removes all matches.
Notice that I removed the escape character (backslash) from the OR. In adding single quotes around the thing, the pipe doesn't need to be escaped and the backslash actually resulted in it being non-functional.

So this one is redundant: sed -E 's/-$//g'

Removing all starting and trailing dashes is mandatory due to the rules on DNS names. See DNS Syntax Rules

The characters allowed in labels are a subset of the ASCII character set, consisting of characters a through z, A through Z, digits 0 through 9, and hyphen. This rule is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in case-independent manner.[26] **Labels may not start or end with a hyphen.**

So the more concise version that supports zsh and OSX compatibility is

echo " - -  Esperança do vôo do avião  - - " | iconv -t ascii//TRANSLIT | sed -E 's/[^a-zA-Z0-9]+/-/g' | sed -E 's/^-+|-+$//g' | tr A-Z a-z

Note that if keeping multiple hyphens is something desirable, this code won't work. Just add the hyphen to the allowable characters regex to keep multiple hyphens. I read something about how there cannot be hyphens in both the third and fourth position and verified that for domain names but I don't know if that applies to subdomain parts of a domain name or elsewhere. This code does not handle this.

echo " - -  Esperança do--vôo do avião  - - " | iconv -t ascii//TRANSLIT | sed -E 's/[^a-zA-Z0-9-]+/-/g' | sed -E 's/^-+|-+$//g' | tr A-Z a-z

kissu commented May 8, 2019

I would add ' for compatibility with zsh
Thank you very much @gerardo-junior !

cavo789 commented May 11, 2021

Best to replace tr A-Z a-z at the end by tr "[:upper:]" "[:lower:]" to support accentuated characters like É f.i.

pjboro commented May 10, 2022

Best to replace tr A-Z a-z at the end by tr "[:upper:]" "[:lower:]" to support accentuated characters like É f.i.

These characters are handled by iconv. I thought, were they not, they would be handled by sed replace, but at least in GNU sed 4.8 most of them belongs to a-z range.

╰─➤  echo É | iconv -t ascii//TRANSLIT                                                                     
E

# not every diacritic is contained in a-z
╰─➤  echo "ā, ä, ǟ, ḑ, ē, ī, ļ, ņ, ō, ȯ, ȱ, õ, ȭ, ŗ, š, ț, ū, ž." | sed -r 's/[^a-zA-Z0-9]+/-/g' | sed -r 's/^-+\|-+$//g' | tr A-Z a-z                  130 ↵
ā-ä-ǟ-ḑ-ē-ī-ļ-ņ-ō-ȯ-ȱ-õ-ȭ-ŗ-š-ț-ū-

janosgyerik commented Nov 9, 2022 •

edited

Loading

It's good to replace multiple sed processes with a single one using multiple -e parameters.

It's good to use [:alnum:] instead of [^a-zA-Z0-9].

It's good to use tr "[:upper:]" "[:lower:]" instead of tr A-Z a-z as a matter of principle for the goal of lowercasing input. To know that tr A-Z a-z is good enough requires verifying what comes before in the pipeline, and knowing how iconv works. That's added mental burden.

Putting it together:

iconv -t ascii//TRANSLIT | sed -E -e 's/[^[:alnum:]]+/-/g' -e 's/^-+|-+$//g' | tr '[:upper:]' '[:lower:]'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment