-
-
Save codeguy/6684588 to your computer and use it in GitHub Desktop.
function string_to_slug (str) { | |
str = str.replace(/^\s+|\s+$/g, ''); // trim | |
str = str.toLowerCase(); | |
// remove accents, swap ñ for n, etc | |
var from = "àáäâèéëêìíïîòóöôùúüûñç·/_,:;"; | |
var to = "aaaaeeeeiiiioooouuuunc------"; | |
for (var i=0, l=from.length ; i<l ; i++) { | |
str = str.replace(new RegExp(from.charAt(i), 'g'), to.charAt(i)); | |
} | |
str = str.replace(/[^a-z0-9 -]/g, '') // remove invalid chars | |
.replace(/\s+/g, '-') // collapse whitespace and replace by - | |
.replace(/-+/g, '-'); // collapse dashes | |
return str; | |
} |
Today you can just yarn add limax
then:
import slugify from "limax"
const burgundy = slugify('i ♥ lamp'); // i-love-lamp
@juanlanus answer worked for me, shortened it a bit further
const slugify = text =>
text
.toString()
.normalize('NFD')
.replace(/[\u0300-\u036f]/g, '')
.toLowerCase()
.trim()
.replace(/\s+/g, '-')
.replace(/[^\w-]+/g, '')
.replace(/--+/g, '-')
jeez, too many implementation
And if we implements this?
String.prototype.slugify = function (separator = "-") {
return this
.toString()
.normalize('NFD') // split an accented letter in the base letter and the acent
.replace(/[\u0300-\u036f]/g, '') // remove all previously split accents
.toLowerCase()
.trim()
.replace(/[^a-z0-9 ]/g, '') // remove all chars not letters, numbers and spaces (to be replaced)
.replace(/\s+/g, separator);
};
Ex:
"Exportação de _ peças - avícolas para Você".slugify() // exportacao-de-pecas-avicolas-para-voce
let text = "Café é o combustível para programação!!"
text.slugify("_") // cafe_e_o_combustivel_para_programacao
If I did, then I could slugify a string with ease, which happens in one place in my programs, but would need to be aware that the String object has an addition in the other 999 code lines where I use strings for other purposes.
I would do what Eduarso says if I were slugifying everywhere in my program.
And if we implements this?
String.prototype.slugify = function (separator = "-") { return this .toString() .normalize('NFD') // split an accented letter in the base letter and the acent .replace(/[\u0300-\u036f]/g, '') // remove all previously split accents .toLowerCase() .trim() .replace(/[^a-z0-9 ]/g, '') // remove all chars not letters, numbers and spaces (to be replaced) .replace(/\s+/g, separator); };Ex:
"Exportação de _ peças - avícolas para Você".slugify() // exportacao-de-pecas-avicolas-para-voce let text = "Café é o combustível para programação!!" text.slugify("_") // cafe_e_o_combustivel_para_programacao
Nice!!!
I write in typescript without separator... looks:
export const slugify = (...args: (string | number)[]): string => {
const value = args.join(' ')
return value
.normalize('NFD') // split an accented letter in the base letter and the acent
.replace(/[\u0300-\u036f]/g, '') // remove all previously split accents
.toLowerCase()
.trim()
.replace(/[^a-z0-9 ]/g, '') // remove all chars not letters, numbers and spaces (to be replaced)
.replace(/\s+/g, '-') // separator
}
https://gist.github.com/max10rogerio/c67c5d2d7a3ce714c4bc0c114a3ddc6e
Thanks for this!
And if we implements this?
String.prototype.slugify = function (separator = "-") { return this .toString() .normalize('NFD') // split an accented letter in the base letter and the acent .replace(/[\u0300-\u036f]/g, '') // remove all previously split accents .toLowerCase() .trim() .replace(/[^a-z0-9 ]/g, '') // remove all chars not letters, numbers and spaces (to be replaced) .replace(/\s+/g, separator); };Ex:
"Exportação de _ peças - avícolas para Você".slugify() // exportacao-de-pecas-avicolas-para-voce let text = "Café é o combustível para programação!!" text.slugify("_") // cafe_e_o_combustivel_para_programacaoNice!!!
I write in typescript without separator... looks:export const slugify = (...args: (string | number)[]): string => { const value = args.join(' ') return value .normalize('NFD') // split an accented letter in the base letter and the acent .replace(/[\u0300-\u036f]/g, '') // remove all previously split accents .toLowerCase() .trim() .replace(/[^a-z0-9 ]/g, '') // remove all chars not letters, numbers and spaces (to be replaced) .replace(/\s+/g, '-') // separator }https://gist.github.com/max10rogerio/c67c5d2d7a3ce714c4bc0c114a3ddc6e
awesome.
awesome
Nice!!
Complete version with @felipeftrindade suggestion:
function string_to_slug (str) { str = str.replace(/^\s+|\s+$/g, ''); // trim str = str.toLowerCase(); // remove accents, swap ñ for n, etc var from = "àáãäâèéëêìíïîòóöôùúüûñç·/_,:;"; var to = "aaaaaeeeeiiiioooouuuunc------"; for (var i=0, l=from.length ; i<l ; i++) { str = str.replace(new RegExp(from.charAt(i), 'g'), to.charAt(i)); } str = str.replace(/[^a-z0-9 -]/g, '') // remove invalid chars .replace(/\s+/g, '-') // collapse whitespace and replace by - .replace(/-+/g, '-'); // collapse dashes return str; }
Thanks for this idea
There is a lot of implementations for slugify over the internet right now. The thing that caught my attention is: so many years all custom versions copying same wrong patterns:
In this part:
// remove accents, swap ñ for n, etc
var from = "àáãäâèéëêìíïîòóöôùúüûñç·/_,:;";
var to = "aaaaaeeeeiiiioooouuuunc------";
Why do you converting anything to '-' over here by costly loop, if you taking care of all non alphanumerics later.
The right version would be:
// remove accents, swap ñ for n, etc
var from = "àáãäâèéëêìíïîòóöôùúüûñç";
var to = "aaaaaeeeeiiiioooouuuunc";
Cleaning non alphanumerics
str = str.replace(/[^a-z0-9 -]/g, '') // remove invalid chars
.replace(/\s+/g, '-') // collapse whitespace and replace by -
.replace(/-+/g, '-'); // collapse dashes
1st - it will connect several words if separated by invalid chars only
2nd - you taking care of ' ' & '-' two times
The more optimal way to fix it:
str = str.replace(/[^a-z0-9]/g, '-') // remove all except alphanumerics & replace all with '-'
.replace(/-+/g, '-'); // collapse dashes
@Oleg-Imanilov:
You are right if you needed to support IE.
Else the "normalize" approach is preferable because it does not depends on the developer providing two synchronized lengthy character strings:
...
.normalize( 'NFD' ) // split an accented letter in the base letter and its acent
.replace( /[\u0300-\u036f]/g, '' ) // remove all previously split accents
...
This is a native JS Unicode function.
Great!!!
And if with we move trim() down and add "-" to the exclusion
"five-year-old ?".slugify()
//result: "five-year-old" instead of "fiveyearold-"
String.prototype.slugify = function (separator = "-") {
return this
.toString()
.normalize('NFD') // split an accented letter in the base letter and the acent
.replace(/[\u0300-\u036f]/g, '') // remove all previously split accents
.toLowerCase()
.replace(/[^a-z0-9 -]/g, '') // remove all chars not letters, numbers and spaces (to be replaced)
.trim()
.replace(/\s+/g, separator);
};
Ewo
something similar to Django's slugify:
export default function (str: string) {
return str
.normalize('NFKD')
.toLowerCase()
.replace(/[^\w\s-]/g, '')
.trim()
.replace(/[-\s]+/g, '-');
};
and tests:
describe('test slugify', () => {
test('test1', () => {
expect(slugify(' Jack & Jill like numbers 1,2,3 and 4 and silly characters ?%.$!/'))
.toEqual('jack-jill-like-numbers-123-and-4-and-silly-characters');
});
test('test2', () => {
expect(slugify("Un \xe9l\xe9phant \xe0 l'or\xe9e du bois"))
.toEqual('un-elephant-a-loree-du-bois');
});
});
@amranwebdeveloper
I'm building a website that sometimes the slug have a chinese character and i want to keep it
1 - Create a file named cleanSlug.js
2 - Inside the file, copy and paste the code below
/**
* * This function create a slug friendily to use in your web application
* * Compatibility with chinese characters
* ! Chinese characters doesn't have any modification
* @param slug
* @returns cleaned slug
*/
export function cleanSlug(slug) {
slug = slug.replace(/^\s+|\s+$/g, '');
slug = slug.toLowerCase();
const from = 'àáäâèéëêìíïîòóöôùúüûñç·/_,:;';
const to = 'aaaaeeeeiiiioooouuuunc------';
for (let i = 0, l = from.length; i < l; i++) {
slug = slug.replace(new RegExp(from.charAt(i), 'g'), to.charAt(i));
}
slug = slug
.normalize('NFD')
.replace(/[^a-z0-9 -]^[\u4e00-\u9fa5]/g, '') // remove all that not are a letter, a number, and are not a chinese word
.replace(/\s+/g, '-')
.replace(/-+/g, '-')
.replace('-?', '')
.replace('?', '');
return slug;
}
Everybody here is missing the -
in .replace(/[^a-z0-9 ]/g, '')
=> .replace(/[^a-z0-9 -]/g, '')
just a quick edit with some more characters.
export default (str) => { str = String(str).toString(); str = str.replace(/^\s+|\s+$/g, ""); // trim str = str.toLowerCase(); // remove accents, swap ñ for n, etc const swaps = { '0': ['°', '₀', '۰', '0'], '1': ['¹', '₁', '۱', '1'], '2': ['²', '₂', '۲', '2'], '3': ['³', '₃', '۳', '3'], '4': ['⁴', '₄', '۴', '٤', '4'], '5': ['⁵', '₅', '۵', '٥', '5'], '6': ['⁶', '₆', '۶', '٦', '6'], '7': ['⁷', '₇', '۷', '7'], '8': ['⁸', '₈', '۸', '8'], '9': ['⁹', '₉', '۹', '9'], 'a': ['à', 'á', 'ả', 'ã', 'ạ', 'ă', 'ắ', 'ằ', 'ẳ', 'ẵ', 'ặ', 'â', 'ấ', 'ầ', 'ẩ', 'ẫ', 'ậ', 'ā', 'ą', 'å', 'α', 'ά', 'ἀ', 'ἁ', 'ἂ', 'ἃ', 'ἄ', 'ἅ', 'ἆ', 'ἇ', 'ᾀ', 'ᾁ', 'ᾂ', 'ᾃ', 'ᾄ', 'ᾅ', 'ᾆ', 'ᾇ', 'ὰ', 'ά', 'ᾰ', 'ᾱ', 'ᾲ', 'ᾳ', 'ᾴ', 'ᾶ', 'ᾷ', 'а', 'أ', 'အ', 'ာ', 'ါ', 'ǻ', 'ǎ', 'ª', 'ა', 'अ', 'ا', 'a', 'ä'], 'b': ['б', 'β', 'ب', 'ဗ', 'ბ', 'b'], 'c': ['ç', 'ć', 'č', 'ĉ', 'ċ', 'c'], 'd': ['ď', 'ð', 'đ', 'ƌ', 'ȡ', 'ɖ', 'ɗ', 'ᵭ', 'ᶁ', 'ᶑ', 'д', 'δ', 'د', 'ض', 'ဍ', 'ဒ', 'დ', 'd'], 'e': ['é', 'è', 'ẻ', 'ẽ', 'ẹ', 'ê', 'ế', 'ề', 'ể', 'ễ', 'ệ', 'ë', 'ē', 'ę', 'ě', 'ĕ', 'ė', 'ε', 'έ', 'ἐ', 'ἑ', 'ἒ', 'ἓ', 'ἔ', 'ἕ', 'ὲ', 'έ', 'е', 'ё', 'э', 'є', 'ə', 'ဧ', 'ေ', 'ဲ', 'ე', 'ए', 'إ', 'ئ', 'e'], 'f': ['ф', 'φ', 'ف', 'ƒ', 'ფ', 'f'], 'g': ['ĝ', 'ğ', 'ġ', 'ģ', 'г', 'ґ', 'γ', 'ဂ', 'გ', 'گ', 'g'], 'h': ['ĥ', 'ħ', 'η', 'ή', 'ح', 'ه', 'ဟ', 'ှ', 'ჰ', 'h'], 'i': ['í', 'ì', 'ỉ', 'ĩ', 'ị', 'î', 'ï', 'ī', 'ĭ', 'į', 'ı', 'ι', 'ί', 'ϊ', 'ΐ', 'ἰ', 'ἱ', 'ἲ', 'ἳ', 'ἴ', 'ἵ', 'ἶ', 'ἷ', 'ὶ', 'ί', 'ῐ', 'ῑ', 'ῒ', 'ΐ', 'ῖ', 'ῗ', 'і', 'ї', 'и', 'ဣ', 'ိ', 'ီ', 'ည်', 'ǐ', 'ი', 'इ', 'ی', 'i'], 'j': ['ĵ', 'ј', 'Ј', 'ჯ', 'ج', 'j'], 'k': ['ķ', 'ĸ', 'к', 'κ', 'Ķ', 'ق', 'ك', 'က', 'კ', 'ქ', 'ک', 'k'], 'l': ['ł', 'ľ', 'ĺ', 'ļ', 'ŀ', 'л', 'λ', 'ل', 'လ', 'ლ', 'l'], 'm': ['м', 'μ', 'م', 'မ', 'მ', 'm'], 'n': ['ñ', 'ń', 'ň', 'ņ', 'ʼn', 'ŋ', 'ν', 'н', 'ن', 'န', 'ნ', 'n'], 'o': ['ó', 'ò', 'ỏ', 'õ', 'ọ', 'ô', 'ố', 'ồ', 'ổ', 'ỗ', 'ộ', 'ơ', 'ớ', 'ờ', 'ở', 'ỡ', 'ợ', 'ø', 'ō', 'ő', 'ŏ', 'ο', 'ὀ', 'ὁ', 'ὂ', 'ὃ', 'ὄ', 'ὅ', 'ὸ', 'ό', 'о', 'و', 'θ', 'ို', 'ǒ', 'ǿ', 'º', 'ო', 'ओ', 'o', 'ö'], 'p': ['п', 'π', 'ပ', 'პ', 'پ', 'p'], 'q': ['ყ', 'q'], 'r': ['ŕ', 'ř', 'ŗ', 'р', 'ρ', 'ر', 'რ', 'r'], 's': ['ś', 'š', 'ş', 'с', 'σ', 'ș', 'ς', 'س', 'ص', 'စ', 'ſ', 'ს', 's'], 't': ['ť', 'ţ', 'т', 'τ', 'ț', 'ت', 'ط', 'ဋ', 'တ', 'ŧ', 'თ', 'ტ', 't'], 'u': ['ú', 'ù', 'ủ', 'ũ', 'ụ', 'ư', 'ứ', 'ừ', 'ử', 'ữ', 'ự', 'û', 'ū', 'ů', 'ű', 'ŭ', 'ų', 'µ', 'у', 'ဉ', 'ု', 'ူ', 'ǔ', 'ǖ', 'ǘ', 'ǚ', 'ǜ', 'უ', 'उ', 'u', 'ў', 'ü'], 'v': ['в', 'ვ', 'ϐ', 'v'], 'w': ['ŵ', 'ω', 'ώ', 'ဝ', 'ွ', 'w'], 'x': ['χ', 'ξ', 'x'], 'y': ['ý', 'ỳ', 'ỷ', 'ỹ', 'ỵ', 'ÿ', 'ŷ', 'й', 'ы', 'υ', 'ϋ', 'ύ', 'ΰ', 'ي', 'ယ', 'y'], 'z': ['ź', 'ž', 'ż', 'з', 'ζ', 'ز', 'ဇ', 'ზ', 'z'], 'aa': ['ع', 'आ', 'آ'], 'ae': ['æ', 'ǽ'], 'ai': ['ऐ'], 'ch': ['ч', 'ჩ', 'ჭ', 'چ'], 'dj': ['ђ', 'đ'], 'dz': ['џ', 'ძ'], 'ei': ['ऍ'], 'gh': ['غ', 'ღ'], 'ii': ['ई'], 'ij': ['ij'], 'kh': ['х', 'خ', 'ხ'], 'lj': ['љ'], 'nj': ['њ'], 'oe': ['ö', 'œ', 'ؤ'], 'oi': ['ऑ'], 'oii': ['ऒ'], 'ps': ['ψ'], 'sh': ['ш', 'შ', 'ش'], 'shch': ['щ'], 'ss': ['ß'], 'sx': ['ŝ'], 'th': ['þ', 'ϑ', 'ث', 'ذ', 'ظ'], 'ts': ['ц', 'ც', 'წ'], 'ue': ['ü'], 'uu': ['ऊ'], 'ya': ['я'], 'yu': ['ю'], 'zh': ['ж', 'ჟ', 'ژ'], '(c)': ['©'], 'A': ['Á', 'À', 'Ả', 'Ã', 'Ạ', 'Ă', 'Ắ', 'Ằ', 'Ẳ', 'Ẵ', 'Ặ', 'Â', 'Ấ', 'Ầ', 'Ẩ', 'Ẫ', 'Ậ', 'Å', 'Ā', 'Ą', 'Α', 'Ά', 'Ἀ', 'Ἁ', 'Ἂ', 'Ἃ', 'Ἄ', 'Ἅ', 'Ἆ', 'Ἇ', 'ᾈ', 'ᾉ', 'ᾊ', 'ᾋ', 'ᾌ', 'ᾍ', 'ᾎ', 'ᾏ', 'Ᾰ', 'Ᾱ', 'Ὰ', 'Ά', 'ᾼ', 'А', 'Ǻ', 'Ǎ', 'A', 'Ä'], 'B': ['Б', 'Β', 'ब', 'B'], 'C': ['Ç', 'Ć', 'Č', 'Ĉ', 'Ċ', 'C'], 'D': ['Ď', 'Ð', 'Đ', 'Ɖ', 'Ɗ', 'Ƌ', 'ᴅ', 'ᴆ', 'Д', 'Δ', 'D'], 'E': ['É', 'È', 'Ẻ', 'Ẽ', 'Ẹ', 'Ê', 'Ế', 'Ề', 'Ể', 'Ễ', 'Ệ', 'Ë', 'Ē', 'Ę', 'Ě', 'Ĕ', 'Ė', 'Ε', 'Έ', 'Ἐ', 'Ἑ', 'Ἒ', 'Ἓ', 'Ἔ', 'Ἕ', 'Έ', 'Ὲ', 'Е', 'Ё', 'Э', 'Є', 'Ə', 'E'], 'F': ['Ф', 'Φ', 'F'], 'G': ['Ğ', 'Ġ', 'Ģ', 'Г', 'Ґ', 'Γ', 'G'], 'H': ['Η', 'Ή', 'Ħ', 'H'], 'I': ['Í', 'Ì', 'Ỉ', 'Ĩ', 'Ị', 'Î', 'Ï', 'Ī', 'Ĭ', 'Į', 'İ', 'Ι', 'Ί', 'Ϊ', 'Ἰ', 'Ἱ', 'Ἳ', 'Ἴ', 'Ἵ', 'Ἶ', 'Ἷ', 'Ῐ', 'Ῑ', 'Ὶ', 'Ί', 'И', 'І', 'Ї', 'Ǐ', 'ϒ', 'I'], 'J': ['J'], 'K': ['К', 'Κ', 'K'], 'L': ['Ĺ', 'Ł', 'Л', 'Λ', 'Ļ', 'Ľ', 'Ŀ', 'ल', 'L'], 'M': ['М', 'Μ', 'M'], 'N': ['Ń', 'Ñ', 'Ň', 'Ņ', 'Ŋ', 'Н', 'Ν', 'N'], 'O': ['Ó', 'Ò', 'Ỏ', 'Õ', 'Ọ', 'Ô', 'Ố', 'Ồ', 'Ổ', 'Ỗ', 'Ộ', 'Ơ', 'Ớ', 'Ờ', 'Ở', 'Ỡ', 'Ợ', 'Ø', 'Ō', 'Ő', 'Ŏ', 'Ο', 'Ό', 'Ὀ', 'Ὁ', 'Ὂ', 'Ὃ', 'Ὄ', 'Ὅ', 'Ὸ', 'Ό', 'О', 'Θ', 'Ө', 'Ǒ', 'Ǿ', 'O', 'Ö'], 'P': ['П', 'Π', 'P'], 'Q': ['Q'], 'R': ['Ř', 'Ŕ', 'Р', 'Ρ', 'Ŗ', 'R'], 'S': ['Ş', 'Ŝ', 'Ș', 'Š', 'Ś', 'С', 'Σ', 'S'], 'T': ['Ť', 'Ţ', 'Ŧ', 'Ț', 'Т', 'Τ', 'T'], 'U': ['Ú', 'Ù', 'Ủ', 'Ũ', 'Ụ', 'Ư', 'Ứ', 'Ừ', 'Ử', 'Ữ', 'Ự', 'Û', 'Ū', 'Ů', 'Ű', 'Ŭ', 'Ų', 'У', 'Ǔ', 'Ǖ', 'Ǘ', 'Ǚ', 'Ǜ', 'U', 'Ў', 'Ü'], 'V': ['В', 'V'], 'W': ['Ω', 'Ώ', 'Ŵ', 'W'], 'X': ['Χ', 'Ξ', 'X'], 'Y': ['Ý', 'Ỳ', 'Ỷ', 'Ỹ', 'Ỵ', 'Ÿ', 'Ῠ', 'Ῡ', 'Ὺ', 'Ύ', 'Ы', 'Й', 'Υ', 'Ϋ', 'Ŷ', 'Y'], 'Z': ['Ź', 'Ž', 'Ż', 'З', 'Ζ', 'Z'], 'AE': ['Æ', 'Ǽ'], 'Ch': ['Ч'], 'Dj': ['Ђ'], 'Dz': ['Џ'], 'Gx': ['Ĝ'], 'Hx': ['Ĥ'], 'Ij': ['IJ'], 'Jx': ['Ĵ'], 'Kh': ['Х'], 'Lj': ['Љ'], 'Nj': ['Њ'], 'Oe': ['Œ'], 'Ps': ['Ψ'], 'Sh': ['Ш'], 'Shch': ['Щ'], 'Ss': ['ẞ'], 'Th': ['Þ'], 'Ts': ['Ц'], 'Ya': ['Я'], 'Yu': ['Ю'], 'Zh': ['Ж'], }; Object.keys(swaps).forEach((swap) => { swaps[swap].forEach(s => { str = str.replace(new RegExp(s, "g"), swap); }) }); return str .replace(/[^a-z0-9 -]/g, "") // remove invalid chars .replace(/\s+/g, "-") // collapse whitespace and replace by - .replace(/-+/g, "-") // collapse dashes .replace(/^-+/, "") // trim - from start of text .replace(/-+$/, ""); };
this is too much
@tauseedzaman:
Isn't all the swaps thing achieved with the two lines below?
.normalize( 'NFD' ) // split an accented letter in the base letter and the acent
.replace( /[\u0300-\u036f]/g, '' ) // remove all previously split accents
thanks guys
just a quick edit with some more characters.
export default (str) => { str = String(str).toString(); str = str.replace(/^\s+|\s+$/g, ""); // trim str = str.toLowerCase(); // remove accents, swap ñ for n, etc const swaps = { '0': ['°', '₀', '۰', '0'], '1': ['¹', '₁', '۱', '1'], '2': ['²', '₂', '۲', '2'], '3': ['³', '₃', '۳', '3'], '4': ['⁴', '₄', '۴', '٤', '4'], '5': ['⁵', '₅', '۵', '٥', '5'], '6': ['⁶', '₆', '۶', '٦', '6'], '7': ['⁷', '₇', '۷', '7'], '8': ['⁸', '₈', '۸', '8'], '9': ['⁹', '₉', '۹', '9'], 'a': ['à', 'á', 'ả', 'ã', 'ạ', 'ă', 'ắ', 'ằ', 'ẳ', 'ẵ', 'ặ', 'â', 'ấ', 'ầ', 'ẩ', 'ẫ', 'ậ', 'ā', 'ą', 'å', 'α', 'ά', 'ἀ', 'ἁ', 'ἂ', 'ἃ', 'ἄ', 'ἅ', 'ἆ', 'ἇ', 'ᾀ', 'ᾁ', 'ᾂ', 'ᾃ', 'ᾄ', 'ᾅ', 'ᾆ', 'ᾇ', 'ὰ', 'ά', 'ᾰ', 'ᾱ', 'ᾲ', 'ᾳ', 'ᾴ', 'ᾶ', 'ᾷ', 'а', 'أ', 'အ', 'ာ', 'ါ', 'ǻ', 'ǎ', 'ª', 'ა', 'अ', 'ا', 'a', 'ä'], 'b': ['б', 'β', 'ب', 'ဗ', 'ბ', 'b'], 'c': ['ç', 'ć', 'č', 'ĉ', 'ċ', 'c'], 'd': ['ď', 'ð', 'đ', 'ƌ', 'ȡ', 'ɖ', 'ɗ', 'ᵭ', 'ᶁ', 'ᶑ', 'д', 'δ', 'د', 'ض', 'ဍ', 'ဒ', 'დ', 'd'], 'e': ['é', 'è', 'ẻ', 'ẽ', 'ẹ', 'ê', 'ế', 'ề', 'ể', 'ễ', 'ệ', 'ë', 'ē', 'ę', 'ě', 'ĕ', 'ė', 'ε', 'έ', 'ἐ', 'ἑ', 'ἒ', 'ἓ', 'ἔ', 'ἕ', 'ὲ', 'έ', 'е', 'ё', 'э', 'є', 'ə', 'ဧ', 'ေ', 'ဲ', 'ე', 'ए', 'إ', 'ئ', 'e'], 'f': ['ф', 'φ', 'ف', 'ƒ', 'ფ', 'f'], 'g': ['ĝ', 'ğ', 'ġ', 'ģ', 'г', 'ґ', 'γ', 'ဂ', 'გ', 'گ', 'g'], 'h': ['ĥ', 'ħ', 'η', 'ή', 'ح', 'ه', 'ဟ', 'ှ', 'ჰ', 'h'], 'i': ['í', 'ì', 'ỉ', 'ĩ', 'ị', 'î', 'ï', 'ī', 'ĭ', 'į', 'ı', 'ι', 'ί', 'ϊ', 'ΐ', 'ἰ', 'ἱ', 'ἲ', 'ἳ', 'ἴ', 'ἵ', 'ἶ', 'ἷ', 'ὶ', 'ί', 'ῐ', 'ῑ', 'ῒ', 'ΐ', 'ῖ', 'ῗ', 'і', 'ї', 'и', 'ဣ', 'ိ', 'ီ', 'ည်', 'ǐ', 'ი', 'इ', 'ی', 'i'], 'j': ['ĵ', 'ј', 'Ј', 'ჯ', 'ج', 'j'], 'k': ['ķ', 'ĸ', 'к', 'κ', 'Ķ', 'ق', 'ك', 'က', 'კ', 'ქ', 'ک', 'k'], 'l': ['ł', 'ľ', 'ĺ', 'ļ', 'ŀ', 'л', 'λ', 'ل', 'လ', 'ლ', 'l'], 'm': ['м', 'μ', 'م', 'မ', 'მ', 'm'], 'n': ['ñ', 'ń', 'ň', 'ņ', 'ʼn', 'ŋ', 'ν', 'н', 'ن', 'န', 'ნ', 'n'], 'o': ['ó', 'ò', 'ỏ', 'õ', 'ọ', 'ô', 'ố', 'ồ', 'ổ', 'ỗ', 'ộ', 'ơ', 'ớ', 'ờ', 'ở', 'ỡ', 'ợ', 'ø', 'ō', 'ő', 'ŏ', 'ο', 'ὀ', 'ὁ', 'ὂ', 'ὃ', 'ὄ', 'ὅ', 'ὸ', 'ό', 'о', 'و', 'θ', 'ို', 'ǒ', 'ǿ', 'º', 'ო', 'ओ', 'o', 'ö'], 'p': ['п', 'π', 'ပ', 'პ', 'پ', 'p'], 'q': ['ყ', 'q'], 'r': ['ŕ', 'ř', 'ŗ', 'р', 'ρ', 'ر', 'რ', 'r'], 's': ['ś', 'š', 'ş', 'с', 'σ', 'ș', 'ς', 'س', 'ص', 'စ', 'ſ', 'ს', 's'], 't': ['ť', 'ţ', 'т', 'τ', 'ț', 'ت', 'ط', 'ဋ', 'တ', 'ŧ', 'თ', 'ტ', 't'], 'u': ['ú', 'ù', 'ủ', 'ũ', 'ụ', 'ư', 'ứ', 'ừ', 'ử', 'ữ', 'ự', 'û', 'ū', 'ů', 'ű', 'ŭ', 'ų', 'µ', 'у', 'ဉ', 'ု', 'ူ', 'ǔ', 'ǖ', 'ǘ', 'ǚ', 'ǜ', 'უ', 'उ', 'u', 'ў', 'ü'], 'v': ['в', 'ვ', 'ϐ', 'v'], 'w': ['ŵ', 'ω', 'ώ', 'ဝ', 'ွ', 'w'], 'x': ['χ', 'ξ', 'x'], 'y': ['ý', 'ỳ', 'ỷ', 'ỹ', 'ỵ', 'ÿ', 'ŷ', 'й', 'ы', 'υ', 'ϋ', 'ύ', 'ΰ', 'ي', 'ယ', 'y'], 'z': ['ź', 'ž', 'ż', 'з', 'ζ', 'ز', 'ဇ', 'ზ', 'z'], 'aa': ['ع', 'आ', 'آ'], 'ae': ['æ', 'ǽ'], 'ai': ['ऐ'], 'ch': ['ч', 'ჩ', 'ჭ', 'چ'], 'dj': ['ђ', 'đ'], 'dz': ['џ', 'ძ'], 'ei': ['ऍ'], 'gh': ['غ', 'ღ'], 'ii': ['ई'], 'ij': ['ij'], 'kh': ['х', 'خ', 'ხ'], 'lj': ['љ'], 'nj': ['њ'], 'oe': ['ö', 'œ', 'ؤ'], 'oi': ['ऑ'], 'oii': ['ऒ'], 'ps': ['ψ'], 'sh': ['ш', 'შ', 'ش'], 'shch': ['щ'], 'ss': ['ß'], 'sx': ['ŝ'], 'th': ['þ', 'ϑ', 'ث', 'ذ', 'ظ'], 'ts': ['ц', 'ც', 'წ'], 'ue': ['ü'], 'uu': ['ऊ'], 'ya': ['я'], 'yu': ['ю'], 'zh': ['ж', 'ჟ', 'ژ'], '(c)': ['©'], 'A': ['Á', 'À', 'Ả', 'Ã', 'Ạ', 'Ă', 'Ắ', 'Ằ', 'Ẳ', 'Ẵ', 'Ặ', 'Â', 'Ấ', 'Ầ', 'Ẩ', 'Ẫ', 'Ậ', 'Å', 'Ā', 'Ą', 'Α', 'Ά', 'Ἀ', 'Ἁ', 'Ἂ', 'Ἃ', 'Ἄ', 'Ἅ', 'Ἆ', 'Ἇ', 'ᾈ', 'ᾉ', 'ᾊ', 'ᾋ', 'ᾌ', 'ᾍ', 'ᾎ', 'ᾏ', 'Ᾰ', 'Ᾱ', 'Ὰ', 'Ά', 'ᾼ', 'А', 'Ǻ', 'Ǎ', 'A', 'Ä'], 'B': ['Б', 'Β', 'ब', 'B'], 'C': ['Ç', 'Ć', 'Č', 'Ĉ', 'Ċ', 'C'], 'D': ['Ď', 'Ð', 'Đ', 'Ɖ', 'Ɗ', 'Ƌ', 'ᴅ', 'ᴆ', 'Д', 'Δ', 'D'], 'E': ['É', 'È', 'Ẻ', 'Ẽ', 'Ẹ', 'Ê', 'Ế', 'Ề', 'Ể', 'Ễ', 'Ệ', 'Ë', 'Ē', 'Ę', 'Ě', 'Ĕ', 'Ė', 'Ε', 'Έ', 'Ἐ', 'Ἑ', 'Ἒ', 'Ἓ', 'Ἔ', 'Ἕ', 'Έ', 'Ὲ', 'Е', 'Ё', 'Э', 'Є', 'Ə', 'E'], 'F': ['Ф', 'Φ', 'F'], 'G': ['Ğ', 'Ġ', 'Ģ', 'Г', 'Ґ', 'Γ', 'G'], 'H': ['Η', 'Ή', 'Ħ', 'H'], 'I': ['Í', 'Ì', 'Ỉ', 'Ĩ', 'Ị', 'Î', 'Ï', 'Ī', 'Ĭ', 'Į', 'İ', 'Ι', 'Ί', 'Ϊ', 'Ἰ', 'Ἱ', 'Ἳ', 'Ἴ', 'Ἵ', 'Ἶ', 'Ἷ', 'Ῐ', 'Ῑ', 'Ὶ', 'Ί', 'И', 'І', 'Ї', 'Ǐ', 'ϒ', 'I'], 'J': ['J'], 'K': ['К', 'Κ', 'K'], 'L': ['Ĺ', 'Ł', 'Л', 'Λ', 'Ļ', 'Ľ', 'Ŀ', 'ल', 'L'], 'M': ['М', 'Μ', 'M'], 'N': ['Ń', 'Ñ', 'Ň', 'Ņ', 'Ŋ', 'Н', 'Ν', 'N'], 'O': ['Ó', 'Ò', 'Ỏ', 'Õ', 'Ọ', 'Ô', 'Ố', 'Ồ', 'Ổ', 'Ỗ', 'Ộ', 'Ơ', 'Ớ', 'Ờ', 'Ở', 'Ỡ', 'Ợ', 'Ø', 'Ō', 'Ő', 'Ŏ', 'Ο', 'Ό', 'Ὀ', 'Ὁ', 'Ὂ', 'Ὃ', 'Ὄ', 'Ὅ', 'Ὸ', 'Ό', 'О', 'Θ', 'Ө', 'Ǒ', 'Ǿ', 'O', 'Ö'], 'P': ['П', 'Π', 'P'], 'Q': ['Q'], 'R': ['Ř', 'Ŕ', 'Р', 'Ρ', 'Ŗ', 'R'], 'S': ['Ş', 'Ŝ', 'Ș', 'Š', 'Ś', 'С', 'Σ', 'S'], 'T': ['Ť', 'Ţ', 'Ŧ', 'Ț', 'Т', 'Τ', 'T'], 'U': ['Ú', 'Ù', 'Ủ', 'Ũ', 'Ụ', 'Ư', 'Ứ', 'Ừ', 'Ử', 'Ữ', 'Ự', 'Û', 'Ū', 'Ů', 'Ű', 'Ŭ', 'Ų', 'У', 'Ǔ', 'Ǖ', 'Ǘ', 'Ǚ', 'Ǜ', 'U', 'Ў', 'Ü'], 'V': ['В', 'V'], 'W': ['Ω', 'Ώ', 'Ŵ', 'W'], 'X': ['Χ', 'Ξ', 'X'], 'Y': ['Ý', 'Ỳ', 'Ỷ', 'Ỹ', 'Ỵ', 'Ÿ', 'Ῠ', 'Ῡ', 'Ὺ', 'Ύ', 'Ы', 'Й', 'Υ', 'Ϋ', 'Ŷ', 'Y'], 'Z': ['Ź', 'Ž', 'Ż', 'З', 'Ζ', 'Z'], 'AE': ['Æ', 'Ǽ'], 'Ch': ['Ч'], 'Dj': ['Ђ'], 'Dz': ['Џ'], 'Gx': ['Ĝ'], 'Hx': ['Ĥ'], 'Ij': ['IJ'], 'Jx': ['Ĵ'], 'Kh': ['Х'], 'Lj': ['Љ'], 'Nj': ['Њ'], 'Oe': ['Œ'], 'Ps': ['Ψ'], 'Sh': ['Ш'], 'Shch': ['Щ'], 'Ss': ['ẞ'], 'Th': ['Þ'], 'Ts': ['Ц'], 'Ya': ['Я'], 'Yu': ['Ю'], 'Zh': ['Ж'], }; Object.keys(swaps).forEach((swap) => { swaps[swap].forEach(s => { str = str.replace(new RegExp(s, "g"), swap); }) }); return str .replace(/[^a-z0-9 -]/g, "") // remove invalid chars .replace(/\s+/g, "-") // collapse whitespace and replace by - .replace(/-+/g, "-") // collapse dashes .replace(/^-+/, "") // trim - from start of text .replace(/-+$/, ""); };this is too much
My old but still true explain version.
https://gist.github.com/codeguy/6684588?permalink_comment_id=3243980#gistcomment-3243980
More info about normalize()...
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
The normalize() method returns the Unicode Normalization Form of a given string.
normalize()
splits an accented character into its base character and the applied accent.
Then .replace( /[\u0300-\u036f]/g, '' )
removes all the accents that are aggrupated in that codes range.
I think it works with all current and future accented characters.
edit 01/17/2021
My new version
function slugify(text) { return text .toString() // Cast to string (optional) .normalize('NFKD') // The normalize() using NFKD method returns the Unicode Normalization Form of a given string. .toLowerCase() // Convert the string to lowercase letters .trim() // Remove whitespace from both sides of a string (optional) .replace(/\s+/g, '-') // Replace spaces with - .replace(/[^\w\-]+/g, '') // Remove all non-word chars .replace(/\-\-+/g, '-'); // Replace multiple - with single - }NFKD is probably better than NFD. Any feedback is welcome.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
I broke this:
slugify('Jurassic Park III, 2001 - ★★★')
You get a trailing '-'
jurassic-park-iii-2001-
Maybe add this?
.replace(/\-$/g, ''); // Remove trailing -
hey guys how about converting kanji like japanese words for slug?
hey guys how about converting kanji like japanese words for slug?
You want to keep they or you want to translate to english ?
edit 01/17/2021
My new versionfunction slugify(text) { return text .toString() // Cast to string (optional) .normalize('NFKD') // The normalize() using NFKD method returns the Unicode Normalization Form of a given string. .toLowerCase() // Convert the string to lowercase letters .trim() // Remove whitespace from both sides of a string (optional) .replace(/\s+/g, '-') // Replace spaces with - .replace(/[^\w\-]+/g, '') // Remove all non-word chars .replace(/\-\-+/g, '-'); // Replace multiple - with single - }NFKD is probably better than NFD. Any feedback is welcome.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalizeI broke this:
slugify('Jurassic Park III, 2001 - ★★★')
You get a trailing '-'
jurassic-park-iii-2001-
Maybe add this?
.replace(/\-$/g, ''); // Remove trailing -
Added that in, and also a line to change underscores to hyphens. May not be perfect but its good for my uses!
const slugify = (text) => {
return text
.toString() // Cast to string (optional)
.normalize('NFKD') // The normalize() using NFKD method returns the Unicode Normalization Form of a given string.
.toLowerCase() // Convert the string to lowercase letters
.trim() // Remove whitespace from both sides of a string (optional)
.replace(/\s+/g, '-') // Replace spaces with -
.replace(/[^\w\-]+/g, '') // Remove all non-word chars
.replace(/\_/g,'-') // Replace _ with -
.replace(/\-\-+/g, '-') // Replace multiple - with single -
.replace(/\-$/g, ''); // Remove trailing -
}
@torma616
AFAIK your version, which looks good, is missing the following line below the normalize one:
replace( /[\u0300-\u036f]/g, '' )
The normalize()
function splits each accented character in two: the base character, and its accent.
The subsequent replace()
line deletes all the accents, which happen to be all in the \u03xx UNICODE block.
Removing the accents requires these two steps.
I got the info for my slugify version from June 6, 2020 from the MDN docs:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize
This version includes an accents removal function for all the UNICODE spectrum.
The normalize function (standard in JS) separates accented letters from their accents. the replace step replaces al the accents by nothing, thus leaving the base letters alone. Based on @thierryc 's version, not yet tested: