This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
A modification of MTSW2E Example 6-3 (http://bit.ly/1aWYgAv) with the following modifications: | |
* Extra debugging information is written to sys.stderr to help isolate any problematic content | |
that may be encountered. | |
* A (hopeful) fix to a blasted UnicodeEncodeError in cleanContent() that may be triggered from | |
quopri.decodestring attempting to decode an already decoded Unicode value. | |
* The JSONification in jsonifyMessage now ignores any content that's not text. MIME-encoded content | |
such as images, PDFs, and other non-text data that is not useful for textual analysis without | |
significant additional work is now no longer carried forward into the JSON for import into MongoDB. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
# Studying this script might be helpful in understanding why UnicodeDecode errors | |
# sometimes happen when trying to capture utf-8 output to files with Python 2 even | |
# though the output prints to your (utf-8 capable) terminal. | |
# Note that the first line of this file is called the Byte Order Marker (BOM), which | |
# is a directive to tell Python that it should treat this file as utf-8 (i.e. comments and | |
# string values may be utf-8) |
OlderNewer