Skip to content

Instantly share code, notes, and snippets.

@JosephCatrambone
Last active August 29, 2015 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JosephCatrambone/42c3e562f14ff3bcffd0 to your computer and use it in GitHub Desktop.
Save JosephCatrambone/42c3e562f14ff3bcffd0 to your computer and use it in GitHub Desktop.
Doing a "cast" from Unicode strings to look-alike ASCII characters
# coding=utf-8
import string
sourcemap = u"àáâãäåÀÁÂÃÄÅèéêëÈÉÊËìíîïÌÍÎÏòóôõöÒÓÔÕÖùúûüÙÚÛÜýÿÝñÑ¿¡";
destmap = "aaaaaaAAAAAAeeeeEEEEiiiiIIIIoooooOOOOOuuuuUUUUyyYnN?!";
def scrub(s, replacement_char=''):
"""Returns a new string with accented characters replaced by the closest ASCII character available.
This is conceptually similar to multiple passes with maketrans+translate."""
scrubbed_sentence = "";
for letter in s:
if letter in string.ascii_letters or letter in string.digits or letter in string.punctuation or letter == ' ':
scrubbed_sentence += str(letter);
elif letter in sourcemap:
scrubbed_sentence += destmap[sourcemap.index(letter)];
else:
scrubbed_sentence += replacement_char;
return scrubbed_sentence;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment