Instantly share code, notes, and snippets.

Embed
What would you like to do?
Decoding emails in Python e.g. for GMail and imapclient lib
import email
def get_decoded_email_body(message_body):
""" Decode email body.
Detect character set if the header is not set.
We try to get text/plain, but if there is not one then fallback to text/html.
:param message_body: Raw 7-bit message body input e.g. from imaplib. Double encoded in quoted-printable and latin-1
:return: Message body as unicode string
"""
msg = email.message_from_string(message_body)
text = ""
if msg.is_multipart():
html = None
for part in msg.get_payload():
print "%s, %s" % (part.get_content_type(), part.get_content_charset())
if part.get_content_charset() is None:
# We cannot know the character set, so return decoded "something"
text = part.get_payload(decode=True)
continue
charset = part.get_content_charset()
if part.get_content_type() == 'text/plain':
text = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace')
if part.get_content_type() == 'text/html':
html = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace')
if text is not None:
return text.strip()
else:
return html.strip()
else:
text = unicode(msg.get_payload(decode=True), msg.get_content_charset(), 'ignore').encode('utf8', 'replace')
return text.strip()
@ghost

This comment has been minimized.

Copy link

ghost commented Apr 19, 2014

get_decoded_email_body() takes exactly 1 argument (2 given) this error comes.

right now i am passing this:
data[0][1].decode("utf-8").encode('latin-1')

which looks like

--047d7b41791d4238e204f751445d
Content-Type: text/plain; charset=UTF-8

some content...
...
...

--047d7b41791d4238e204f751445d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

some htm lcontent

--047d7b41791d4238e204f751445d--

@Schizo

This comment has been minimized.

Copy link

Schizo commented Oct 5, 2014

Worked perfectly! thanks

@xarg

This comment has been minimized.

Copy link

xarg commented Apr 8, 2015

Since parts of the email can contain other parts the correct way to iterate over parts is using msg.walk() - (recursive) instead of msg.get_payload() - (top level only).

@TonyFrancis

This comment has been minimized.

Copy link

TonyFrancis commented Aug 3, 2015

how to modify this to get attachment filename ?

@robert-dzikowski

This comment has been minimized.

Copy link

robert-dzikowski commented Jun 6, 2017

How can I decode email which I got using Gmail API? I am getting error "TypeError: initial_value must be str or None, not dict"

@ransom4real

This comment has been minimized.

Copy link

ransom4real commented Mar 11, 2018

Saved someone from loosing all their hair. Thank you for this. Very useful!

@mihircomp

This comment has been minimized.

Copy link

mihircomp commented Jun 13, 2018

Very Very Very useful..
Thank you

@ghost

This comment has been minimized.

Copy link

ghost commented Jun 15, 2018

What up

@JiachengHe

This comment has been minimized.

Copy link

JiachengHe commented Jul 5, 2018

This is super useful! Finally I find the solution here. Thank you sosososo much.

@rahulgour25

This comment has been minimized.

Copy link

rahulgour25 commented Oct 12, 2018

Signature image is not coming in text/html or text/plain both section , how to get it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment