Skip to content

Instantly share code, notes, and snippets.

@ivanyu
Last active December 16, 2015 17:58
Show Gist options
  • Save ivanyu/5473838 to your computer and use it in GitHub Desktop.
Save ivanyu/5473838 to your computer and use it in GitHub Desktop.
class UnicodeWriter:
"""A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
The standard `csv` module isn't able to handle Unicode. We can "cheat" it.
Firstly, we encode it into plain UTF-8 byte string and write into the
memory buffer (`StringIO`). Then we convert created CSV data back into
Unicode and write to the target file.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8"):
self.buffer = StringIO()
self.writer = csv.writer(self.buffer, dialect=dialect)
self.target_stream = f
def writerow(self, row):
# Row elements may contain raw Unicode codepoints.
# We must encode them into UTF-8 (unicode string -> plain byte string).
encoded_row = [s.encode("utf-8") for s in row]
# Write encoded row with the standard CSV writer.
self.writer.writerow(encoded_row)
# Valid CSV row is now in the memory. Get it ...
data = self.buffer.getvalue()
# and convert back into Unicode.
data = data.decode("utf-8")
# Now we can easily write valid CSV row in Unicode
# into the target file.
self.target_stream.write(data)
# Empty the buffer.
self.buffer.truncate(0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment