Skip to content

Instantly share code, notes, and snippets.

@codeforkjeff
Created January 25, 2016 22:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save codeforkjeff/b929dd8ddeaa60299205 to your computer and use it in GitHub Desktop.
Save codeforkjeff/b929dd8ddeaa60299205 to your computer and use it in GitHub Desktop.
possible encoding problem with ipfs-api
# run this with LC_ALL and LANG env vars set to "en_US.utf8"
import ipfsApi
utf8_filename = u"clich\xe9.txt".encode('utf8')
with open(utf8_filename, "w") as f:
f.write("this is just a test")
c = ipfsApi.Client('127.0.0.1', 5001)
response = c.add(utf8_filename)
# note unicode strings in the response
print response
print utf8_filename
print response['Name']
# why is this False?
# it also displays a warning: UnicodeWarning: Unicode unequal comparison
# failed to convert both arguments to Unicode - interpreting them as being unequal
print response['Name'] == utf8_filename
# prints True: why is this necessary?
print response['Name'].encode('latin-1') == utf8_filename
print "done."
@deltab
Copy link

deltab commented Jan 26, 2016

utf8_filename is just a byte string: Python doesn't remember where it came from or what the bytes meant. So when it's asked to compare it with a Unicode string, it doesn't know which encoding to use to convert the bytes into Unicode. Hence it displays a warning and returns False from the comparison.

In the second comparison you avoid the problem by explicitly converting the Unicode string into bytes (using Latin-1), and there's no problem comparing byte strings.

They should not, however, compare equal: UTF-8 and Latin-1 will encode any non-ASCII character to different sequences of bytes (as long as it's in Latin-1's repertoire). It would seem that response['Name'] has been corrupted somewhere, by the bytes being decoded using Latin-1 instead of UTF-8, making it u'clich\xc3\xe9' instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment