Instantly share code, notes, and snippets.

Embed
What would you like to do?
Decoding emails in Python e.g. for GMail and imapclient lib
import email
def get_decoded_email_body(message_body):
""" Decode email body.
Detect character set if the header is not set.
We try to get text/plain, but if there is not one then fallback to text/html.
:param message_body: Raw 7-bit message body input e.g. from imaplib. Double encoded in quoted-printable and latin-1
:return: Message body as unicode string
"""
msg = email.message_from_string(message_body)
text = ""
if msg.is_multipart():
html = None
for part in msg.get_payload():
print "%s, %s" % (part.get_content_type(), part.get_content_charset())
if part.get_content_charset() is None:
# We cannot know the character set, so return decoded "something"
text = part.get_payload(decode=True)
continue
charset = part.get_content_charset()
if part.get_content_type() == 'text/plain':
text = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace')
if part.get_content_type() == 'text/html':
html = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace')
if text is not None:
return text.strip()
else:
return html.strip()
else:
text = unicode(msg.get_payload(decode=True), msg.get_content_charset(), 'ignore').encode('utf8', 'replace')
return text.strip()
@ghost

This comment has been minimized.

ghost commented Apr 19, 2014

get_decoded_email_body() takes exactly 1 argument (2 given) this error comes.

right now i am passing this:
data[0][1].decode("utf-8").encode('latin-1')

which looks like

--047d7b41791d4238e204f751445d
Content-Type: text/plain; charset=UTF-8

some content...
...
...

--047d7b41791d4238e204f751445d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

some htm lcontent

--047d7b41791d4238e204f751445d--

@Schizo

This comment has been minimized.

Schizo commented Oct 5, 2014

Worked perfectly! thanks

@xarg

This comment has been minimized.

xarg commented Apr 8, 2015

Since parts of the email can contain other parts the correct way to iterate over parts is using msg.walk() - (recursive) instead of msg.get_payload() - (top level only).

@TonyFrancis

This comment has been minimized.

TonyFrancis commented Aug 3, 2015

how to modify this to get attachment filename ?

@robert-dzikowski

This comment has been minimized.

robert-dzikowski commented Jun 6, 2017

How can I decode email which I got using Gmail API? I am getting error "TypeError: initial_value must be str or None, not dict"

@ransom4real

This comment has been minimized.

ransom4real commented Mar 11, 2018

Saved someone from loosing all their hair. Thank you for this. Very useful!

@mihircomp

This comment has been minimized.

mihircomp commented Jun 13, 2018

Very Very Very useful..
Thank you

@ghost

This comment has been minimized.

ghost commented Jun 15, 2018

What up

@JiachengHe

This comment has been minimized.

JiachengHe commented Jul 5, 2018

This is super useful! Finally I find the solution here. Thank you sosososo much.

@rahulgour25

This comment has been minimized.

rahulgour25 commented Oct 12, 2018

Signature image is not coming in text/html or text/plain both section , how to get it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment