Decoding emails in Python e.g. for GMail and imapclient lib
import email | |
def get_decoded_email_body(message_body): | |
""" Decode email body. | |
Detect character set if the header is not set. | |
We try to get text/plain, but if there is not one then fallback to text/html. | |
:param message_body: Raw 7-bit message body input e.g. from imaplib. Double encoded in quoted-printable and latin-1 | |
:return: Message body as unicode string | |
""" | |
msg = email.message_from_string(message_body) | |
text = "" | |
if msg.is_multipart(): | |
html = None | |
for part in msg.get_payload(): | |
print "%s, %s" % (part.get_content_type(), part.get_content_charset()) | |
if part.get_content_charset() is None: | |
# We cannot know the character set, so return decoded "something" | |
text = part.get_payload(decode=True) | |
continue | |
charset = part.get_content_charset() | |
if part.get_content_type() == 'text/plain': | |
text = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace') | |
if part.get_content_type() == 'text/html': | |
html = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace') | |
if text is not None: | |
return text.strip() | |
else: | |
return html.strip() | |
else: | |
text = unicode(msg.get_payload(decode=True), msg.get_content_charset(), 'ignore').encode('utf8', 'replace') | |
return text.strip() |
This comment has been minimized.
This comment has been minimized.
Schizo
commented
Oct 5, 2014
Worked perfectly! thanks |
This comment has been minimized.
This comment has been minimized.
xarg
commented
Apr 8, 2015
Since parts of the email can contain other parts the correct way to iterate over parts is using |
This comment has been minimized.
This comment has been minimized.
TonyFrancis
commented
Aug 3, 2015
how to modify this to get attachment filename ? |
This comment has been minimized.
This comment has been minimized.
robert-dzikowski
commented
Jun 6, 2017
How can I decode email which I got using Gmail API? I am getting error "TypeError: initial_value must be str or None, not dict" |
This comment has been minimized.
This comment has been minimized.
ransom4real
commented
Mar 11, 2018
Saved someone from loosing all their hair. Thank you for this. Very useful! |
This comment has been minimized.
This comment has been minimized.
mihircomp
commented
Jun 13, 2018
Very Very Very useful.. |
This comment has been minimized.
This comment has been minimized.
ghost
commented
Jun 15, 2018
What up |
This comment has been minimized.
This comment has been minimized.
JiachengHe
commented
Jul 5, 2018
This is super useful! Finally I find the solution here. Thank you sosososo much. |
This comment has been minimized.
This comment has been minimized.
rahulgour25
commented
Oct 12, 2018
•
Signature image is not coming in text/html or text/plain both section , how to get it? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This comment has been minimized.
ghost commentedApr 19, 2014
get_decoded_email_body() takes exactly 1 argument (2 given) this error comes.
right now i am passing this:
data[0][1].decode("utf-8").encode('latin-1')
which looks like
--047d7b41791d4238e204f751445d--