Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Remove accents and symbols not compatible with Latin base alphabet
/*
This works by converting text to decomposed unicode form, such that the
accents are treated as separate characters. We then select the characters
we want, by means of a regex and then join the matched groups.
There are certain characters that won't work with this, such as 'ø', since
it is not an 'o' with a slash accent.
*/
function asciiFriendlyText (text) {
return text.normalize("NFD").match(/([\u0009-\u0014\u0020-\u007E])+/g).join('')
}
// French
console.log(asciiFriendlyText('éléphant'));
console.log(asciiFriendlyText('Je suis un élève'));
// Vietnamese
console.log(asciiFriendlyText('ruộng'));
// Unsupported, since they would require different logic:
console.log(asciiFriendlyText('Æ, Ø, ß'));
console.log(asciiFriendlyText('Đà Nẵng, Quảng Nam, Quảng Ngãi, Bình Định, Phú Yên, Nha Trang'));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.