Last active
August 29, 2015 14:08
-
-
Save jmcelroy5/81bf945a9aff01ea9f0d to your computer and use it in GitHub Desktop.
Working with unicode in python
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def omg_unicode(some_input): | |
""" | |
Decode input immediately, work internally with unicode, encode at the end | |
Remember: Bytes in --> unicode everywhere --> bytes out | |
""" | |
# Decode your input (convert from <type 'str'> to <type 'unicode'>) | |
unicode_string = some_input.decode('utf8','ignore') | |
# Unicode all the things! (i.e. precede text with u and refer to non-ascii characters by their unicode code point) | |
string1 = u"Arabic: \u0687, Tibetan: \u0fbf, Greek: \u03C0 (mmm pie)" | |
string2 = u"Monkey emoji: \U0001f648 \U0001F649 \U0001F64A " | |
string3 = u"This string is all ascii letters, but I'm unicoding it anyway." | |
output = unicode_string + "\n" + string1 + "\n" + string2 + "\n" + string3 | |
# At the end, call .encode() on text you want to send to the outside world. | |
byte_string = output.encode('utf8') | |
print byte_string | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment