Skip to content

Instantly share code, notes, and snippets.

@minichiello
Created September 27, 2009 15:31
Show Gist options
  • Save minichiello/194817 to your computer and use it in GitHub Desktop.
Save minichiello/194817 to your computer and use it in GitHub Desktop.
Python Remover Acentos
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from unicodedata import normalize
def remover_acentos(txt, codif='utf-8'):
''' Devolve cópia de uma str substituindo os caracteres
acentuados pelos seus equivalentes não acentuados.
ATENÇÃO: carateres gráficos não ASCII e não alfa-numéricos,
tais como bullets, travessões, aspas assimétricas, etc.
são simplesmente removidos!
>>> remover_acentos('[ACENTUAÇÃO] ç: áàãâä! éèêë? íìĩîï, óòõôö; úùũûü.')
'[ACENTUACAO] c: aaaaa! eeee? iiiii, ooooo; uuuuu.'
'''
return normalize('NFKD', txt.decode(codif)).encode('ASCII','ignore')
if __name__ == '__main__':
from doctest import testmod
testmod()
@hectorusma
Copy link

File "mypath/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 4: ordinal not in range(128)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment