Skip to content

Instantly share code, notes, and snippets.

@JosephCatrambone JosephCatrambone/scrub.py
Last active Aug 29, 2015

Embed
What would you like to do?
Doing a "cast" from Unicode strings to look-alike ASCII characters
# coding=utf-8
import string
sourcemap = u"àáâãäåÀÁÂÃÄÅèéêëÈÉÊËìíîïÌÍÎÏòóôõöÒÓÔÕÖùúûüÙÚÛÜýÿÝñÑ¿¡";
destmap = "aaaaaaAAAAAAeeeeEEEEiiiiIIIIoooooOOOOOuuuuUUUUyyYnN?!";
def scrub(s, replacement_char=''):
"""Returns a new string with accented characters replaced by the closest ASCII character available.
This is conceptually similar to multiple passes with maketrans+translate."""
scrubbed_sentence = "";
for letter in s:
if letter in string.ascii_letters or letter in string.digits or letter in string.punctuation or letter == ' ':
scrubbed_sentence += str(letter);
elif letter in sourcemap:
scrubbed_sentence += destmap[sourcemap.index(letter)];
else:
scrubbed_sentence += replacement_char;
return scrubbed_sentence;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.