Skip to content

Instantly share code, notes, and snippets.

@rponte
Last active April 10, 2024 23:01
Show Gist options
  • Star 66 You must be signed in to star a gist
  • Fork 11 You must be signed in to fork a gist
  • Save rponte/893494 to your computer and use it in GitHub Desktop.
Save rponte/893494 to your computer and use it in GitHub Desktop.
Removing accents and special characters in Java: StringUtils.java and StringUtilsTest.java
package br.com.triadworks.rponte.util;
import java.text.Normalizer;
public class StringUtils {
/**
* Remove toda a acentuação da string substituindo por caracteres simples sem acento.
*/
public static String unaccent(String src) {
return Normalizer
.normalize(src, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
}
}
package br.com.triadworks.rponte.util;
import static org.junit.Assert.assertEquals;
import org.junit.Test;
public class StringUtilsTest {
private static final String accents = "È,É,Ê,Ë,Û,Ù,Ï,Î,À,Â,Ô,è,é,ê,ë,û,ù,ï,î,à,â,ô,Ç,ç,Ã,ã,Õ,õ";
private static final String expected = "E,E,E,E,U,U,I,I,A,A,O,e,e,e,e,u,u,i,i,a,a,o,C,c,A,a,O,o";
private static final String accents2 = "çÇáéíóúýÁÉÍÓÚÝàèìòùÀÈÌÒÙãõñäëïöüÿÄËÏÖÜÃÕÑâêîôûÂÊÎÔÛ";
private static final String expected2 = "cCaeiouyAEIOUYaeiouAEIOUaonaeiouyAEIOUAONaeiouAEIOU";
private static final String accents3 = "Gisele Bündchen da Conceição e Silva foi batizada assim em homenagem à sua conterrânea de Horizontina, RS.";
private static final String expected3 = "Gisele Bundchen da Conceicao e Silva foi batizada assim em homenagem a sua conterranea de Horizontina, RS.";
private static final String accents4 = "/Users/rponte/arquivos-portalfcm/Eletron/Atualização_Diária-1.23.40.exe";
private static final String expected4 = "/Users/rponte/arquivos-portalfcm/Eletron/Atualizacao_Diaria-1.23.40.exe";
@Test
public void replacingAllAccents() {
assertEquals(expected, StringUtils.unaccent(accents));
assertEquals(expected2, StringUtils.unaccent(accents2));
assertEquals(expected3, StringUtils.unaccent(accents3));
assertEquals(expected4, StringUtils.unaccent(accents4));
}
}
@magnoleal
Copy link

Show bola!

@LukaszGrabowski1
Copy link

Doesn't work with Polish letter "ł"

@algra4
Copy link

algra4 commented May 25, 2018

Thanks

@daitvd1997
Copy link

Đ not working

@alexandre1202
Copy link

Working properly.
Congrats!

@fanblater
Copy link

Perfect !

@amosqfigueira
Copy link

Funciona Lindamente !

@savitoh
Copy link

savitoh commented Jan 25, 2020

Muito bom. Vlw.

@Talles71
Copy link

It works, thank you !

@adolfobrunno
Copy link

Muito bom. Obrigado.

@vinnyparker
Copy link

Me ajudou bastante!!! agradecido

@acaciomartins
Copy link

Local para mim funciona perfeito, subo no Websphere 8.5 ele insiste em converter Ç para A.
Ex.: CONSOLAÇÂO fica CONSOLAAO

Será encode do websphere?

@felixkrautschuk
Copy link

Unfortunately, this removes "ß" from the string

@ferdez
Copy link

ferdez commented Apr 24, 2021

+1

@rponte
Copy link
Author

rponte commented Oct 25, 2021

@rponte
Copy link
Author

rponte commented Feb 18, 2022

explicação do Alexandre Aquiles sobre o funcionamento do código acima: https://twitter.com/alex_aquiles/status/1494397659431542784?s=21

@Linkit123
Copy link

Đ character is not working

@anielsonrf
Copy link

Muito Obrigado.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment