Skip to content

Instantly share code, notes, and snippets.

@eek
Last active August 4, 2021 14:23
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save eek/9c4887e80b3ede05c0e39fee4dce3747 to your computer and use it in GitHub Desktop.
Save eek/9c4887e80b3ede05c0e39fee4dce3747 to your computer and use it in GitHub Desktop.
Vanilla JavaScript Slugify + Accent removal - Just another JavaScript Slugifier with an extra line for Accent Removal
function slugify(text) {
return text.toString().toLowerCase().trim()
.normalize('NFD') // separate accent from letter
.replace(/[\u0300-\u036f]/g, '') // remove all separated accents
.replace(/\s+/g, '-') // replace spaces with -
.replace(/&/g, '-and-') // replace & with 'and'
.replace(/[^\w\-]+/g, '') // remove all non-word chars
.replace(/--+/g, '-') // replace multiple '-' with single '-'
}
@exside
Copy link

exside commented Oct 30, 2017

Wont work in IE...because (as expected) it doesn't support normalize(), otherwise nice to know =)

@rowild
Copy link

rowild commented Aug 4, 2021

Why replace(/\-\-+/g, '-')? Wouldn't replace(/-+/g, '-') do it?

@eek
Copy link
Author

eek commented Aug 4, 2021

@rowild

Why replace(/\-\-+/g, '-')? Wouldn't replace(/-+/g, '-') do it?

It would, but that also means replacing single '-' with itself '-'.

/\-\-+/g only matches from the 2nd (e.g. '--') hyphen onwards.

It wouldn't really matter in most cases, but performance wise, if your string has no multiple hyphens and only a few single hyphens after the previous replacements, using a single hyphens match replacement would be slower (Check this - https://jsben.ch/7v4OT for an example only with single hyphens) and this (https://jsben.ch/GxYWA for an example with less multiple hyphens than single ones). But I guess it's negligible in almost all real-world-use scenarios.

@rowild
Copy link

rowild commented Aug 4, 2021

@eek
Very interesting, and makes total sense! Thanks for your explanation.
However, eslint (in VScode) complains about "useless escape characters". It would like to read /--+/g, which actually works fine. Is there also a reason for those backslashes? maybe a historical one (e.g. IE)?

@eek
Copy link
Author

eek commented Aug 4, 2021

@rowild - Can't really remember, I've removed the escape characters now.

@rowild
Copy link

rowild commented Aug 4, 2021

@eek
Cool! Thanks again!
Now one more thing that I observe is that NFD actually does not take care about Umlauts (ä => ae...) or ß (ß -> ss) nor does [\u0300-\u036f] - 2 things which are anyway still quite a bit of a riddle for me.
Is it possible to deduce which "languages" does your script support? I didn't test for Hungarian or Turkish or any of those Nordern languages like islandic... should they theoretically be transcribed correctly according to their "locale"?

To my personal script, I therefore added a snippet for special characters, which I found here:
https://gist.github.com/mathewbyrne/1280286#gistcomment-3753527

Is this "wrong" when using normalize?

@eek
Copy link
Author

eek commented Aug 4, 2021

NFD takes care of all diacritics. So anything that's above or below the character.

It doesn't modify the actual component of the word. So Äpfel has the diacritic removed and becomes Apfel. Türkçe becomes Turkce. It doesn't change ä to ae it just removes the diacritics that are above and below the character. I've used it mainly for French and Romanian to generate URLs from Titles. Places where the written word without diacritics is exactly the same: mămăligă is written mamaliga.

So yeah ß doesn't get converted to anything because it doesn't have any upper or lower accents.

@rowild
Copy link

rowild commented Aug 4, 2021

Finally it made click and I believe to understand, what NFD does! Thank you very much for your efforts, @eek ! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment