Skip to content

Instantly share code, notes, and snippets.

@ptwobrussell
ptwobrussell / MTSW2E Example 6-3 Improvements
Last active December 26, 2016 23:08
A modification of MTSW2E Example 6-3 (http://bit.ly/1aWYgAv) with improvements toward getting the code to work seamlessly on mailboxes exported from Google Takeout.
"""
A modification of MTSW2E Example 6-3 (http://bit.ly/1aWYgAv) with the following modifications:
* Extra debugging information is written to sys.stderr to help isolate any problematic content
that may be encountered.
* A (hopeful) fix to a blasted UnicodeEncodeError in cleanContent() that may be triggered from
quopri.decodestring attempting to decode an already decoded Unicode value.
* The JSONification in jsonifyMessage now ignores any content that's not text. MIME-encoded content
such as images, PDFs, and other non-text data that is not useful for textual analysis without
significant additional work is now no longer carried forward into the JSON for import into MongoDB.
@ptwobrussell
ptwobrussell / gist:1877506
Last active February 10, 2019 11:48
Some analysis of capturing/redirecting UTF-8 output with Python 2
# -*- coding: utf-8 -*-
# Studying this script might be helpful in understanding why UnicodeDecode errors
# sometimes happen when trying to capture utf-8 output to files with Python 2 even
# though the output prints to your (utf-8 capable) terminal.
# Note that the first line of this file is called the Byte Order Marker (BOM), which
# is a directive to tell Python that it should treat this file as utf-8 (i.e. comments and
# string values may be utf-8)