Skip to content

Instantly share code, notes, and snippets.

@akora
Created February 28, 2015 11:18
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akora/51b2933a2554776d7144 to your computer and use it in GitHub Desktop.
Save akora/51b2933a2554776d7144 to your computer and use it in GitHub Desktop.
Removes all diacritics from strings (e.g. names) in a Google spreadsheet
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"ö","o"),"ü","u"),"ó","o"),"ő","o"),"ú","u"),"é","e"),"á","a"),"ű","u"),"í","i"),"Ö","O"),"Ü","U"),"Ó","O"),"Ő","O"),"Ú","U"),"É","E"),"Á","A"),"Ű","U"),"Í","I")
@tukusejssirs
Copy link

@MarkusM1985, thanks for the heads-up.

I have updated the Google Sheets (both the sheet and the function) with the following 17 letters with diacritics that I have missed:

  • ÆAe;
  • ÜU;
  • łl;
  • Œ̀Oe;
  • Œ́Oe;
  • Œ̂Oe;
  • Œ̃Oe;
  • Œ̨̃Oe;
  • Œ̄Oe;
  • Œ̄́Oe;
  • Œ̄̃Oe;
  • Œ̄̆Oe;
  • Œ̣Oe;
  • Œ̨Oe;
  • Œ̯Oe;
  • ąa;
  • ĄA.

@tukusejssirs
Copy link

I have just noticed an issue with replacement that it was replaced with a. This issue is already found in the comments of @luffy5k (here) and @jaspercuvelier (here), however, I made was the ultimate source (here).

I have fixed this issue in the Google Sheets (incl. the scripts) and my repo.

I’m so sorry if I made some of your texts corrupt. 😢

@tukusejssirs
Copy link

Anyway, I am thinking about adding a new option to the scripts. Some might prefer to replace Ð with Dj and ä with ae, others with D and a respectively. I won’t give an ETA, but it might be a nice feature to have.

@mattparkins
Copy link

Hello, there seems to be a bug in the script in that it only seems to remove the first of a repeated diacritic, for instance, Pépé becomes Pepé

Sorry!

@tukusejssirs
Copy link

tukusejssirs commented Jul 23, 2021

@mattparkins, I presume you don’t talk about my script in Google Sheets or my repo, as it works as expected. Could you tried that instead please? 😉

Update: No, the GAS function does not work with Pépé. I think I know the issue. But the substitute() function works as expected.

@mattparkins
Copy link

Apologies, perhaps I'm calling the function incorrectly - I did copy and paste it from your repo directly.

@tukusejssirs
Copy link

@mattparkins, I have fixed the issue in removeDiacritics.gs, both in Google Sheets and in my repo. Try now. 😉

@mattparkins
Copy link

Perfect, that works for me now, thanks!

@tukusejssirs
Copy link

I have just found some letters I missed, most of them with a breve, some with a combining dot, and some other characters. I have updated removeDiacritics.gs, both in Google Sheets and in my repo. Below is a list of the additions.

  • ĂA;
  • ĔE;
  • ĕe;
  • ĞG;
  • ĬI;
  • ĭi;
  • ŏo;
  • ŎO;
  • o;
  • O;
  • ŎO;
  • ŏo;
  • ŬU;
  • ŭu;
  • A;
  • á̱a;
  • Á̱A;
  • ǻa;
  • ǺA;
  • ue;
  • a;
  • SS.

Then I noticed that some letters are not replaced properly, so I looked for a solution. I found this SO question where I read all the answers and rewrote removeDiacritics.gs function from scratch based on the information gathered. It should be much faster now.

I am considering to add some more ligatures (like fi, , ) and also an option to add support for German-like replacements (ÄAe, ÜUe), but no ETA.

@fborgosano
Copy link

Hi people! What I have to do to add a substitution? For example, to convert "ø" in "o"?

Also, when I run the script, I receive the error:"Error TypeError: Cannot read property 'normalize' of undefined removeDiacritics @ removeDiacritics.gs:37"

Sorry, I'm not a developer, I understand just something and I like it but I need it for my work. Thanks in advance!

@tukusejssirs
Copy link

Hi people! What I have to do to add a substitution? For example, to convert "ø" in "o"?

@fborgosano, I added ø and Ø to the replacements. Thanks for the heads up!

Also, when I run the script, I receive the error:"Error TypeError: Cannot read property 'normalize' of undefined removeDiacritics @ removeDiacritics.gs:37"

Where do you run removeDiacritics() function? It might require V8 engine (Google Sheets), therefore if you run it in the old(er) engine, you need to create a new spreadsheet (e.g. using https://sheets.new), copy-paste every sheet, every setting, each and every script (etc) from the old file to the new file. However, if it is not in the engine, I need more info to be able to help you.

@fborgosano
Copy link

Hi! Thanks for the answer. Let me check, I did some mess while trying to modify 🤣
Anyway, to just copy and paste everything was the first thing I did. I copied the code you spoke about here: https://gist.github.com/akora/51b2933a2554776d7144#gistcomment-3791364

@fborgosano
Copy link

Where do you run removeDiacritics() function? It might require V8 engine (Google Sheets), therefore if you run it in the old(er) engine, you need to create a new spreadsheet (e.g. using https://sheets.new), copy-paste every sheet, every setting, each and every script (etc) from the old file to the new file. However, if it is not in the engine, I need more info to be able to help you.

Yes, Chrome V8 runtime is active

@tukusejssirs
Copy link

tukusejssirs commented Sep 18, 2021

I have just tested it in a new spreadsheet under a different Google account and it works. If you want to use removeDiacritics() function:

  1. Create a new spreadsheet.
  2. Copy-paste removeDiacritics() function definition from my spreasheet to yours (ToolsScripts, click on removeDiacritics.gs in the sidebar). Don’t forget to save it. 😉
  3. In your spreadsheet, enter =removeDiacritics("text with diacritics") or =removeDiacritics(A2) (if the text is in A2) into any cell.

@fborgosano
Copy link

fborgosano commented Sep 18, 2021

Can =removeDiacritics(A2) work also on a range? E.g. =removeDiacritics(A2:B100)?

@fborgosano
Copy link

I tried by creating a new sheet and follow your instructions but I still have that error :-(

@tukusejssirs
Copy link

tukusejssirs commented Sep 18, 2021

Can =removeDiacritics(A2) work also on a range? E.g. =removeDiacritics(A2:B100)?

No, it can’t. Now I see how did you get that TypeError: text.normalize is not a function (line 39) error.

In what use case would that be useful? If you want to merge multiple cells, you could use CONCATENATE(A1:A3) to combine the range(s) without separater (i.e. a + b = ab; see this). Or you could use join(" ", A1:A3) to do the same, but add a separator/delimitor (the first argument is the separator). Then you can add either of these functions into removeDiacritics() function and you are good to go. 😉

@fborgosano
Copy link

fborgosano commented Sep 18, 2021

I must send a newsletter to a list of contacts. Many of them are from Sweden, Norway, Denmark and Germany. When I upload the list on the mailing list software (Sendblaster 4), there all of those nordic characters coming up bad formatted. I need to clean them up or in the subject line and in the body of the email they'll see things like "ä", ü and so on. I even tried to see if I could fix it with OpenOffice but no way.

I have to convert all those present into the columns "first name", "last name" and "company name", that's why I need such script. I thought it was something easy, I was even sure that there was already an addon on Google Sheets... it seems it is much harder than I thought :-)

@tukusejssirs
Copy link

tukusejssirs commented Sep 18, 2021

I think now you have all info you need. 😉

Let’s suppose your table has the following columns:

  • A: first name;
  • B: last name;
  • C: company name;
  • D: formatted name (removeDiacritics));

If you want the {first} {last} ({company}) format, you would add the following into D2 (first row is a header I suppose):

=removeDiacritics($a2 & " " & $b2 & " (" & $c2 & ")")

Then populate it into all cells of the D column either by using the + sign in the bottom-right cell corner or copy the whole cell (not just the characters), then select the D column (without the supposed header) and paste it.

@fborgosano
Copy link

fborgosano commented Sep 18, 2021

Like this?
https://ibb.co/2nzyvHB

(Download it if you need it, it expires in one hour)

@fborgosano
Copy link

fborgosano commented Sep 18, 2021

But is this the correct code for the script page?

/**

  • Removes all diacritics from a string and replace some letters without diacritics considered as letters with diacritics or ligatures

  • Note: This function normalises the characters, therefore if you use some characters that are combined with non-diacritics, it will be returned decomposed.

  • This function is based on many answers from https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript

    • Tested with 'ÁàÀĂăâÂǎǍåÅäÄãÃȧȦąĄāĀạAʾá̱Á̱ǻǺᵫẚẠæÆǽǼḅḄćĆĉĈčČçÇďĎḋḊḑḐđĐḍḌéÉèÈêÊěĚëËẽẼĔĕėĖȩȨęĘēĒẹẸḟḞǵǴĞğĝĜǧǦġĠģĢḡḠĥĤȟȞḣḢḩḨḥḤíÍìÌîÎǐǏïÏĩĨİīĪĬĭịỊỊ̣ıĵĴȷḱḰǩǨķĶḳḲĺĹľĽļĻŁłḷḶŀĿḿḾṁṀṃṂńŃǹǸňŇñÑṅṄņŅṇṆóÓŏŎọỌòÒôÔǒǑöÖőŐõÕȯȮǫǪōŌŎŏœŒœ́Œ́œ̀Œ̀œ̂Œ̂œ̃Œ̃œ̨Œ̨œ̨̃Œ̨̃œ̄Œ̄œ̄́Œ̄́œ̄̆Œ̄̆œ̄̃Œ̄̃œ̯Œ̯œ̣Œ̣ṕṔṗṖřŘṙṘŗŖṛṚśŚŝŜšŠṡṠşŞṣṢșȘßẞťŤṫṪţŢṭṬțȚúÚùÙûÛǔǓůŮüÜǘǗǜǛǚǙűŰũŨųŲūŪụỤŬŭṽṼṿṾẃẂẁẀŵŴW̊ẘẅẄẉẈẍẌẋẊýÝỳỲŷŶY̊ẙỹỸẏẎȳȲỵỴźŹẑẐžŽżŻẓẒ'
  • @param {string} text Text with diacritics

  • @return {string} Text with diacritics removed

  • @customfunction
    */
    function removeDiacritics(text) {
    let replacements = {
    'ẚ': 'a',
    'æ': 'ae',
    'Æ': 'Ae',
    'đ': 'd',
    'Đ': 'D',
    'ı': 'i',
    'ȷ': 'j',
    'ŀ': 'l',
    'Ŀ': 'L',
    'Ł': 'L',
    'ł': 'l',
    'œ': 'oe',
    'Œ': 'Oe',
    'ᵫ': 'ue',
    'ß': 'ss',
    'ẞ': 'SS',
    'ø': 'o'
    }

    return text
    // Normalise to NFD Unicode normal form to decompose combined characters into the combination of simple ones
    // src: https://stackoverflow.com/a/37511463/3408342
    .normalize('NFD')
    // Remove all diacritics
    .replace(/\p{Diacritic}/gu, '')
    // Replace letters without diacritics (ligatures, etc)
    // src: https://stackoverflow.com/a/22513545/3408342
    .replace(/[^A-Za-z0-9\s!"#$%&\'\(\)\*\+\,-.\/:;<=>?@\[\\\]^_`\{\|\}~£¬]/g, (letter) => {
    return replacements[letter] || letter
    })
    }

@fborgosano
Copy link

I don't know. It says "Error Unknown function: 'removeDiacritics'."
I think my brain needs some rest. I'll try again tomorrow. In the meantime, thanks for your help!
If you want, I just give you the access to the sheet so you can show me directly there what should I do and what I did wrong. Eventually, send me your email in pvt at my address (see profile).
Goodnight @tukusejssirs!

@tukusejssirs
Copy link

tukusejssirs commented Sep 18, 2021

Like this?
https://ibb.co/2nzyvHB

It seems correct, but you should put it in D2 and populate it into D3:D. But is should still work even in D1. #NAME? suggests Google Sheets did not recognise removeDiacritics() as a valid function. It seems like you duplicated my spreadsheet into your account, therefore the function(s) should be there. That said, you have probably deleted gas_functions sheet where I test/demonstrate removeDiacritics() function. Could you check if it works there? You probably should duplicate the current version of the spreadsheet, because there I demonstrate in Sheet6 sheet that =removeDiacritics($a2 & " " & $b2 & " (" & $c2 & ")") works as expected, therefore there is something else that comes into the way.

But is this the correct code for the script page?

Yes, although it is missing capital Ø. It should work though.

@tukusejssirs
Copy link

tukusejssirs commented Sep 18, 2021

If you start typing in a cell =removedia, do Google Sheets suggest you to use REMOVEDIACRITICS? I should. If it does not, it did not recognise the function definition. Could you check ToolsScripts if the function name is removeDiacritics? It seems from your last comment that is should. You could also replace it with the version from my GitLab repo.


Update: Update repository link.

@NoSubstitute
Copy link

@tukusejssirs - this was exactly what I was looking for, but I was wondering how I can add some more replacements, specifically if I can add a replacement of 'space' with 'dash?

I tried just adding a new row in the list of replacements, but it did nothing. No errors either, though. It just seems to not see the space, and therefore doesn't replace it. The end of the list looks like this for me now.

'ẞ': 'SS',
' ': '-'

Would I instead have to adjust something in the .replace regex further down?

I understand that a space of course isn't a diacritic, but when creating email addresses, the source content often have names with spaces. People with double first names or last names, which I need to replace with a -.

@tukusejssirs
Copy link

@NoSubstitute, I believe you talk about removeDiacritics() function.

The reason why it does not work is the regex defined on L44. It contains a list of characters that should not be replaced. It contains \s which matches [ \t\r\n\f] (aka whitespace characters), specifically a space, a tab, a carriage return, a line feed, or a form feed (src).

Also it contains a mistake: ,-. matches any character between a comma and a full stop / dot. There are no characters between these two, thus it matches only , or ., however, I wanted to also match -. A hyphen (when matched literally) should always go to the end of the list.

I have updated the script in my GitLab repository, however, I moved it to a new one and (as the function is a simple JS function without any dependency on Google APIs) I moved it to js/ folder.

I think I want to keep \s without replacing it. Now, when you copy the updated function, all you need to do is:

  1. append ' ': '-' (as you did) to replacements (before L34);
  2. comment out (prepend // ) L41.

Anyway, I updated the function in gSheets script, however, as I didn’t remove \s from charsToKeep, it won’t work for your case as is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment