Skip to content

Instantly share code, notes, and snippets.

@cdosborn
Last active February 17, 2018 04:07
Show Gist options
  • Save cdosborn/79f50340ab609656dd4481c02cadfcc0 to your computer and use it in GitHub Desktop.
Save cdosborn/79f50340ab609656dd4481c02cadfcc0 to your computer and use it in GitHub Desktop.
Puzzling behavior of python's repr and __repr__
If you open up a python shell and execute the following code,
class Foo:
def __repr__(self):
return u'\xe0'
repr(Foo())
The last line will throw the following exception:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in
position 0: ordinal not in range(128)
Foo is a contrived example. Below is an example derived from apache-libcloud:
def __repr__(self):
return '<Node: uuid=%s, name=%s ...>' % (self.uuid, self.name)
Seems harmless. However, this too will cause the exception to be thrown if
self.name is a unicode string.
I was puzzled when I reproduced the issue with repr(Foo()). I assumed that
repr just returned the result of calling __repr__() on the Foo instance.
For example the following is okay
> repr(u'\xe0')
"u'\\xe0'"
But...
> repr(Foo())
UnicodeEncodeError...
Then I tried googling around about the /actual/ behavior of repr. Someone on
the internet said that repr requires __repr__ to return an ascii string in
python 2. Sure enough the python docs state for __repr__:
The return value must be a string object.
This just made me all the more curious. Next goal: actually go look at the
python implementation of repr.
Much googling followed. How do I know which python implementation I'm using?
Where are the builtins stored in cpython? After some searching I discovered
that repr got the output of __repr__ on an object and then tried to convert
that string to ascii!
TLDR:
In python 2 the repr builtin expects a __repr__ method which returns an ascii
string (or something coercible to an ascii string). It's required because
repr will translate the result of X.__repr__() to ascii, either succeeding or
throwing the above UnicodeEncodeError.
Reasons I think this quirk matters:
1) __repr__ is the fallback way that objects are translated into strings.
str(Foo()) will use __repr__ if __str__ is not defined and fail in the same
way as repr(Foo()).
2) It's really easy to create __repr__ methods that will fail unexepectedly
Like the example above (apache-libcloud), including a user provided field like
self.name is likely to include unicode characters, in turn causing the ascii
encoding to fail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment