Skip to content

Instantly share code, notes, and snippets.

@jmcelroy5
Last active August 29, 2015 14:08
Show Gist options
  • Save jmcelroy5/81bf945a9aff01ea9f0d to your computer and use it in GitHub Desktop.
Save jmcelroy5/81bf945a9aff01ea9f0d to your computer and use it in GitHub Desktop.
Working with unicode in python
def omg_unicode(some_input):
"""
Decode input immediately, work internally with unicode, encode at the end
Remember: Bytes in --> unicode everywhere --> bytes out
"""
# Decode your input (convert from <type 'str'> to <type 'unicode'>)
unicode_string = some_input.decode('utf8','ignore')
# Unicode all the things! (i.e. precede text with u and refer to non-ascii characters by their unicode code point)
string1 = u"Arabic: \u0687, Tibetan: \u0fbf, Greek: \u03C0 (mmm pie)"
string2 = u"Monkey emoji: \U0001f648 \U0001F649 \U0001F64A "
string3 = u"This string is all ascii letters, but I'm unicoding it anyway."
output = unicode_string + "\n" + string1 + "\n" + string2 + "\n" + string3
# At the end, call .encode() on text you want to send to the outside world.
byte_string = output.encode('utf8')
print byte_string
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment