Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Quick bash slugify
echo "$STRING" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z
@tylerkerr

This comment has been minimized.

Copy link

commented Jun 17, 2015

This works in OS X if you replace both instances of sed -r with sed -E

@ScoreUnder

This comment has been minimized.

Copy link

commented Jun 29, 2015

A quick comment on "slugify" since it's probably not that common a term:

It converts a string into a string of lowercase letters, digits and hyphens intended for use in things like URLs or package names.

@sebastianwebber

This comment has been minimized.

Copy link

commented Oct 8, 2016

Works fine on centos 7.

thanks!

@fonini

This comment has been minimized.

Copy link

commented Apr 5, 2017

Thanks!
I had to include another sed, because some accents didn't get removed.

echo "Esperança do vôo do avião" | iconv -t ascii//TRANSLIT | sed -r s/[~\^]+//g | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z

The expected output is: esperanca-do-voo-do-aviao

@gerardo-junior

This comment has been minimized.

Copy link

commented May 28, 2018

I would add ' for compatibility with zsh

echo "Esperança do vôo do avião" | iconv -t ascii//TRANSLIT | sed -r 's/[~\^]+//g' | sed -r 's/[^a-zA-Z0-9]+/-/g' | sed -r 's/^-+\|-+$//g' | tr A-Z a-z
@stefancrowe

This comment has been minimized.

Copy link

commented Jan 16, 2019

Added sed -E 's/-$//g' to remove any trailing dashes from the end of the string (as it's unusual to end with a dash).

echo "Esperança do vôo do avião" | iconv -t ascii//TRANSLIT | sed -r 's/[~\^]+//g' | sed -r 's/[^a-zA-Z0-9]+/-/g' | sed -r 's/^-+\|-+$//g' | sed -r 's/-$//g' | tr A-Z a-z
@diraol

This comment has been minimized.

Copy link

commented Feb 17, 2019

Summing up the comments, with zsh and OSX compatibility, and removing initial and final dashes - from the slugged string, I got the following:

echo " - -  Esperança do vôo do avião  - - " | iconv -t ascii//TRANSLIT | sed -E 's/[~\^]+//g' | sed -E 's/[^a-zA-Z0-9]+/-/g' | sed -E 's/^-+\|-+$//g' | sed -E 's/^-+//g' | sed -E 's/-+$//g' | tr A-Z a-z
@kylemisner

This comment has been minimized.

Copy link

commented Mar 15, 2019

When I run

echo "Esperança do vôo do avião" | iconv -t ascii//TRANSLIT | sed -r s/[~\^]+//g | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z

I get the expected output: esperanca-do-voo-do-aviao

So I don't see why sed -E 's/[~\^]+//g' is necessary.

Also, the original code handles removal of all trailing hyphens with this regex sed -E 's/^-+|-+$//g'
In English, this regex finds one or more hyphens at the start or one or more hyphens at the end then removes all matches.
Notice that I removed the escape character (backslash) from the OR. In adding single quotes around the thing, the pipe doesn't need to be escaped and the backslash actually resulted in it being non-functional.

So this one is redundant: sed -E 's/-$//g'

Removing all starting and trailing dashes is mandatory due to the rules on DNS names. See DNS Syntax Rules

The characters allowed in labels are a subset of the ASCII character set, consisting of characters a through z, A through Z, digits 0 through 9, and hyphen. This rule is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in case-independent manner.[26] **Labels may not start or end with a hyphen.** 

So the more concise version that supports zsh and OSX compatibility is

echo " - -  Esperança do vôo do avião  - - " | iconv -t ascii//TRANSLIT | sed -E 's/[^a-zA-Z0-9]+/-/g' | sed -E 's/^-+|-+$//g' | tr A-Z a-z

Note that if keeping multiple hyphens is something desirable, this code won't work. Just add the hyphen to the allowable characters regex to keep multiple hyphens. I read something about how there cannot be hyphens in both the third and fourth position and verified that for domain names but I don't know if that applies to subdomain parts of a domain name or elsewhere. This code does not handle this.

echo " - -  Esperança do--vôo do avião  - - " | iconv -t ascii//TRANSLIT | sed -E 's/[^a-zA-Z0-9-]+/-/g' | sed -E 's/^-+|-+$//g' | tr A-Z a-z
@kissu

This comment has been minimized.

Copy link

commented May 8, 2019

I would add ' for compatibility with zsh
Thank you very much @gerardo-junior !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.