Skip to content

Instantly share code, notes, and snippets.

@akora
Created February 28, 2015 11:18
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akora/51b2933a2554776d7144 to your computer and use it in GitHub Desktop.
Save akora/51b2933a2554776d7144 to your computer and use it in GitHub Desktop.
Removes all diacritics from strings (e.g. names) in a Google spreadsheet
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"ö","o"),"ü","u"),"ó","o"),"ő","o"),"ú","u"),"é","e"),"á","a"),"ű","u"),"í","i"),"Ö","O"),"Ü","U"),"Ó","O"),"Ő","O"),"Ú","U"),"É","E"),"Á","A"),"Ű","U"),"Í","I")
@NoSubstitute
Copy link

@tukusejssirs - this was exactly what I was looking for, but I was wondering how I can add some more replacements, specifically if I can add a replacement of 'space' with 'dash?

I tried just adding a new row in the list of replacements, but it did nothing. No errors either, though. It just seems to not see the space, and therefore doesn't replace it. The end of the list looks like this for me now.

'ẞ': 'SS',
' ': '-'

Would I instead have to adjust something in the .replace regex further down?

I understand that a space of course isn't a diacritic, but when creating email addresses, the source content often have names with spaces. People with double first names or last names, which I need to replace with a -.

@tukusejssirs
Copy link

@NoSubstitute, I believe you talk about removeDiacritics() function.

The reason why it does not work is the regex defined on L44. It contains a list of characters that should not be replaced. It contains \s which matches [ \t\r\n\f] (aka whitespace characters), specifically a space, a tab, a carriage return, a line feed, or a form feed (src).

Also it contains a mistake: ,-. matches any character between a comma and a full stop / dot. There are no characters between these two, thus it matches only , or ., however, I wanted to also match -. A hyphen (when matched literally) should always go to the end of the list.

I have updated the script in my GitLab repository, however, I moved it to a new one and (as the function is a simple JS function without any dependency on Google APIs) I moved it to js/ folder.

I think I want to keep \s without replacing it. Now, when you copy the updated function, all you need to do is:

  1. append ' ': '-' (as you did) to replacements (before L34);
  2. comment out (prepend // ) L41.

Anyway, I updated the function in gSheets script, however, as I didn’t remove \s from charsToKeep, it won’t work for your case as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment