Skip to content

Instantly share code, notes, and snippets.

@miohtama
Created April 15, 2013 15:56
Show Gist options
  • Star 33 You must be signed in to star a gist
  • Fork 10 You must be signed in to fork a gist
  • Save miohtama/5389146 to your computer and use it in GitHub Desktop.
Save miohtama/5389146 to your computer and use it in GitHub Desktop.
Decoding emails in Python e.g. for GMail and imapclient lib
import email
def get_decoded_email_body(message_body):
""" Decode email body.
Detect character set if the header is not set.
We try to get text/plain, but if there is not one then fallback to text/html.
:param message_body: Raw 7-bit message body input e.g. from imaplib. Double encoded in quoted-printable and latin-1
:return: Message body as unicode string
"""
msg = email.message_from_string(message_body)
text = ""
if msg.is_multipart():
html = None
for part in msg.get_payload():
print "%s, %s" % (part.get_content_type(), part.get_content_charset())
if part.get_content_charset() is None:
# We cannot know the character set, so return decoded "something"
text = part.get_payload(decode=True)
continue
charset = part.get_content_charset()
if part.get_content_type() == 'text/plain':
text = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace')
if part.get_content_type() == 'text/html':
html = unicode(part.get_payload(decode=True), str(charset), "ignore").encode('utf8', 'replace')
if text is not None:
return text.strip()
else:
return html.strip()
else:
text = unicode(msg.get_payload(decode=True), msg.get_content_charset(), 'ignore').encode('utf8', 'replace')
return text.strip()
Copy link

ghost commented Apr 19, 2014

get_decoded_email_body() takes exactly 1 argument (2 given) this error comes.

right now i am passing this:
data[0][1].decode("utf-8").encode('latin-1')

which looks like

--047d7b41791d4238e204f751445d
Content-Type: text/plain; charset=UTF-8

some content...
...
...

--047d7b41791d4238e204f751445d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

some htm lcontent

--047d7b41791d4238e204f751445d--

@Schizo
Copy link

Schizo commented Oct 5, 2014

Worked perfectly! thanks

@xarg
Copy link

xarg commented Apr 8, 2015

Since parts of the email can contain other parts the correct way to iterate over parts is using msg.walk() - (recursive) instead of msg.get_payload() - (top level only).

@TonyFrancis
Copy link

how to modify this to get attachment filename ?

@robert-dzikowski
Copy link

How can I decode email which I got using Gmail API? I am getting error "TypeError: initial_value must be str or None, not dict"

@ransom4real
Copy link

Saved someone from loosing all their hair. Thank you for this. Very useful!

@mihircomp
Copy link

Very Very Very useful..
Thank you

Copy link

ghost commented Jun 15, 2018

What up

@jiachenghe666
Copy link

This is super useful! Finally I find the solution here. Thank you sosososo much.

@rahulgour25
Copy link

rahulgour25 commented Oct 12, 2018

Signature image is not coming in text/html or text/plain both section , how to get it?

@JoeyL6
Copy link

JoeyL6 commented Feb 25, 2019

This is really helpful! Thanks a bunch! Here's couple tips for python 3 users who encountered type errors:

line 15: email.message_from_string --> email.message_from_bytes
line 32 &35: Unicode() --> str()

@Cosqui
Copy link

Cosqui commented May 17, 2019

Gracias por el aporte :) ami me fue de ayuda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment