Skip to content

Instantly share code, notes, and snippets.

@jjmalina
Created December 21, 2012 16:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jjmalina/4353896 to your computer and use it in GitHub Desktop.
Save jjmalina/4353896 to your computer and use it in GitHub Desktop.
Strings in Python. Credit to @lsemel
"""
.dMMMb dMMMMMMP dMMMMb dMP dMMMMb .aMMMMP .dMMMb
dMP" VP dMP dMP.dMP amr dMP dMP dMP" dMP" VP
VMMMb dMP dMMMMK" dMP dMP dMP dMP MMP" VMMMb
dP .dMP dMP dMP"AMF dMP dMP dMP dMP.dMP dP .dMP
VMMMP" dMP dMP dMP dMP dMP dMP VMMMP" VMMMP"
.aMMMb dMMMMb dMMMMMP dMMMMb dMP dMP dMMMMMMP dMMMMMP .dMMMb
dMP"dMP dMP.dMP dMP dMP"dMP dMP.dMP dMP dMP dMP" VP
dMMMMMP dMMMMK" dMMMP dMMMMK" VMMMMP dMP dMMMP VMMMb
dMP dMP dMP"AMF dMP dMP.aMF dA .dMP dMP dMP dP .dMP
dMP dMP dMP dMP dMMMMMP dMMMMP" VMMMP" dMP dMMMMMP VMMMP"
dMP dMMMMb .dMMMb .aMMMb dMMMMMMMMb dMMMMMP
amr dMP dMP dMP" VP dMP"dMP dMP"dMP"dMPdMP
dMP dMP dMP VMMMb dMP dMP dMP dMP dMPdMMMP
dMP dMP dMP dP .dMP dMP.aMP dMP dMP dMPdMP
dMP dMP dMP VMMMP" VMMMP" dMP dMP dMPdMMMMMP
dMMMMMP dMMMMb .aMMMb .aMMMb dMMMMb dMP dMMMMb .aMMMMP
dMP dMP dMP dMP"VMP dMP"dMP dMP VMP amr dMP dMP dMP"
dMMMP dMP dMP dMP dMP dMP dMP dMP dMP dMP dMP dMP MMP"
dMP dMP dMP dMP.aMP dMP.aMP dMP.aMP dMP dMP dMP dMP.dMP amr
dMMMMMP dMP dMP VMMMP" VMMMP" dMMMMP" dMP dMP dMP VMMMP" dMP
dMP dMP dMMMMb dMP .aMMMb .aMMMb dMMMMb dMMMMMP dMP .dMMMb
dMP dMP dMP dMP amr dMP"VMP dMP"dMP dMP VMP dMP amr dMP" VP
dMP dMP dMP dMP dMP dMP dMP dMP dMP dMP dMMMP dMP VMMMb
dMP.aMP dMP dMP dMP dMP.aMP dMP.aMP dMP.aMP dMP dMP dP .dMP
VMMMP" dMP dMP dMP VMMMP" VMMMP" dMMMMP" dMMMMMP dMP VMMMP"
dMMMMb dMP dMP dMMMMMMMMb dMMMMb dMMMMMP dMMMMb .dMMMb
dMP dMP dMP dMP dMP"dMP"dMPdMP"dMP dMP dMP.dMP dMP" VP
dMP dMP dMP dMP dMP dMP dMPdMMMMK" dMMMP dMMMMK" VMMMb
dMP dMP dMP.aMP dMP dMP dMPdMP.aMF dMP dMP"AMF dP .dMP amr
dMP dMP VMMMP" dMP dMP dMPdMMMMP" dMMMMMP dMP dMP VMMMP" dMP
* You do not know the encoding of a string. It could be ASCII or UTF-8.
There is no way to tell.
* Python will assume ASCII encoding when converting from strings to unicode
and throw an error if it encounters an illegal character, such as if you
just encoded UTF-8 into a string (by using smart_str, for instance).
* You should generally not have to do any encoding into strings. Django takes
care of providing Unicode objects, and encoding appropriately whenever
it outputs anything (to the response, or to the database)
* Think of strings as a byte array, and Unicode as some sort of internal object (say, a linked list) that
you can't input or output without encoding or decoding
>>> from django.utils.encoding import smart_str
>>> 'a'+'b' # Two bytes, assumed to be ASCII
'ab'
>>> u'a'+u'b' # Two unicode characters
u'ab'
>>> 'a'+u'b' # The byte 'a' is converted to Unicode, under the assumption it represents ASCII
u'ab'
>>> smart_str('a') # Does nothing
'a'
>>> smart_str(u'a') # The Unicode character 'a' encoded as bytes
'a'
>>> smart_str(u'\u00ff') # Another unicode character, encoded as bytes
'\xc3\xbf'
>>> smart_str(u'\u00ff') + 'aa' # Concatenating two sets of two bytes each
'\xc3\xbfaa'
>>> smart_str(u'\u00ff') + u'aa' # Tries to convert those bytes to Unicode, assuming they are ASCII
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> smart_str(u'\u00ff').decode('utf-8') + u'aa' # Tell Python those bytes are not ASCII, but are UTF-8
u'\xffaa'
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment