Skip to content

Instantly share code, notes, and snippets.

Created August 22, 2016 09:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Tarrasch/e8a9dd8057b9f25353097ce8b9e44ae5 to your computer and use it in GitHub Desktop.
Save Tarrasch/e8a9dd8057b9f25353097ce8b9e44ae5 to your computer and use it in GitHub Desktop.
Vietnamese diacritics
public class VietnameseDiacritics {
// this is only pseudo code, not meant to compile
public static String allVietnameseCombiningDiacriticalMarks() {
String s = "ạàáãảâăư"; // đ is not counted as a diacratic apparently
s = java.text.Normalizer.normalize(s, java.text.Normalizer.Form.NFD);
s = s.replaceAll("[^\\p{InCombiningDiacriticalMarks}]", "");
return s;
private String removeNonVietnameseDiacritics(String input) {
input = java.text.Normalizer.normalize(input, java.text.Normalizer.Form.NFD);
input = input.replaceAll("[\\p{InCombiningDiacriticalMarks}&&[^" + VIETNAMESE_DIACRITICS + "]]", "");
input = java.text.Normalizer.normalize(input, java.text.Normalizer.Form.NFKC);
return input;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment