Skip to content

Instantly share code, notes, and snippets.

@stefansundin
Last active September 27, 2022 18:54
  • Star 66 You must be signed in to star a gist
  • Fork 23 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save stefansundin/a99bbfb6cda873d14fd2 to your computer and use it in GitHub Desktop.
Extract attachments from emails that Gmail doesn't allow you to download. This is dumb. Please use Python >= 3.4.
#!/usr/bin/env python3
# Get your files that Gmail block. Warning message:
# "Anti-virus warning - 1 attachment contains a virus or blocked file. Downloading this attachment is disabled."
# Based on: https://spapas.github.io/2014/10/23/retrieve-gmail-blocked-attachments/
# Instructions:
# Go to your emails, click the arrow button in the top right, "Show original", then "Download Original".
# Move the files to the same directory as this program, then run it.
import sys
import os
import email
import mailbox
from email import policy
from email.parser import BytesParser
# To automatically rename conflicing files, change this to True: (it adds a prefix to the filename, see get_new_filename below)
automatic_rename = False
def get_new_filename(old_fn):
i = 1
while True:
new_fn = "%d-%s" % (i, old_fn)
if not os.path.isfile(new_fn):
return new_fn
i += 1
if __name__ == '__main__':
if sys.version_info[0] < 3:
print("Please use Python 3.")
sys.exit()
if len(sys.argv) < 2:
files = [f for f in os.listdir('.') if os.path.isfile(f) and f.endswith(('.txt', '.eml', '.mbox'))]
if len(files) == 0:
print("Please download the emails and put them in this directory with a .txt, .eml, or .mbox extension.")
sys.exit()
print("Here are the files in the current directory with .txt, .eml, or .mbox extension:")
print('\n'.join(files))
print()
print("Press enter to extract attachments from these files.")
input()
else:
files = sys.argv[1:]
for f in files:
print("Processing %s" % f)
if f.endswith('.mbox'):
messages = mailbox.mbox(f, factory=BytesParser(policy=policy.default).parse, create=False)
print("%d messages." % len(messages))
print()
for msg in messages:
print("Subject: %s" % msg['Subject'])
for pl in msg.get_payload():
if isinstance(pl, str):
# not sure if there's a better way to skip this junk data
continue
fn = pl.get_filename()
if fn:
print("Found %s" % fn)
if os.path.isfile(fn):
print("The file '%s' already exists!" % fn)
if automatic_rename:
fn = get_new_filename(fn)
print("Changed the filename to: %s" % fn)
else:
print("Press enter to overwrite.")
input()
open(fn, 'wb').write(pl.get_payload(decode=True))
print()
else:
msg = email.message_from_file(open(f), policy=policy.default)
print("Subject: %s" % msg['Subject'])
for pl in msg.get_payload():
fn = pl.get_filename()
if fn:
print("Found %s" % fn)
if os.path.isfile(fn):
print("The file '%s' already exists!" % fn)
if automatic_rename:
fn = get_new_filename(fn)
print("Changed the filename to: %s" % fn)
else:
print("Press enter to overwrite.")
input()
open(fn, 'wb').write(pl.get_payload(decode=True))
print()
@ssivanatarajan
Copy link

wow works perfectly recovered my rar file . Thanks

@thiagoh
Copy link

thiagoh commented Mar 10, 2016

man!! it works perfectly!! thank u!!

@ethanpayne
Copy link

Thanks!
Also you can download the attachment using another client such as Mail.app

@zrod
Copy link

zrod commented Mar 24, 2016

Great, worked just fine, thanks!

@huxingyi
Copy link

huxingyi commented Apr 13, 2016

Thanks, but I found it failed to extract Chinese name attachments like this

Content-Type: application/octet-stream; 
    name="=?GB2312?B?c2RoKDUuMTcuMjMuNDbQ3tX9z9C80rP2wazM9LrzxuTL/M/QvNK15sXG?=
    =?GB2312?B?tO3O8ykucmFy?="
Content-Disposition: attachment; 
    filename="=?GB2312?B?c2RoKDUuMTcuMjMuNDbQ3tX9z9C80rP2wazM9LrzxuTL/M/QvNK15sXG?=
    =?GB2312?B?tO3O8ykucmFy?="

Do the follow changes can fix this.

        try:
            open(fn, 'wb').write(pl.get_payload(decode=True))
        except:
            fn = "test.rar"
            print("Invalid filename, just redirect to write %s" % fn)
            open(fn, 'wb').write(pl.get_payload(decode=True))

Hope this trick can help other guys who encounter the same issue.

@stefansundin
Copy link
Author

@huxingyi I updated the code so it should decode filenames properly now. Give it a try.

@nielsaust
Copy link

This is so awesome. I thank you!

@jeanadam
Copy link

Still working great! thanks!

@kavunshiva
Copy link

Dope! This worked like a charm. Yayyy, Python modules for everything, and thanks @stefansundin

@UdayKanike
Copy link

UdayKanike commented Mar 7, 2017

Hi,

I am getting following error, can you please help me

udays-MacBook-Pro:mystuff udaykumarkanike$ python extract-attachments.py 0.txt
Files: 0.txt
()
Processing 0.txt
Subject: None
Traceback (most recent call last):
  File "extract-attachments.py", line 27, in <module>
    fn_header = pl.get_filename()
AttributeError: 'str' object has no attribute 'get_filename'
udays-MacBook-Pro:mystuff udaykumarkanike$ 

Thanks
Uday

@stefansundin
Copy link
Author

Hi @UdayKanike. I suspect your email is formatted in a way I have not encountered before. msg.get_payload() seems to have returned a string.

Try adding this (use the first line to see where to put it):

    for pl in msg.get_payload():
      if not isinstance(pl, email.message.Message):
        print("Unexpected payload")
        continue

@devlifealways
Copy link

Worked perfectly, thanks (y)

@me-suzy
Copy link

me-suzy commented May 17, 2017

EOF problem on notepad, see this print screen:

https://snag.gy/GFLYU6.jpg

@stefansundin
Copy link
Author

@me-suzy it appears you are using Python 2.7. Please run this script with Python 3.

@tejasshah93
Copy link

Works perfectly! Thanks 👍

@NadeemBaloch
Copy link

Where should i place that particular file ?

@stefansundin
Copy link
Author

@NadeemBaloch Place the .py file and your emails (.txt files) in the same directory. Then run the python script.

@wdlv
Copy link

wdlv commented Sep 20, 2017

Thanks a lot! It works like a charm!

@guimeira
Copy link

For all the python noobs like me out there: do not save the script as email.py.

@tolew1
Copy link

tolew1 commented Oct 8, 2017

Genius, this is great and works perfectly. Google and their omnipotence blocks common extensions like JS in zip files now and don't provide an override option. time to for another email.

@riavalon
Copy link

riavalon commented Nov 21, 2017

Really terrific, thanks for this! Had some files from old personal website I saved in my email a loooong time ago and couldn't download them cause of this "antivirus" thing. This script worked perfectly, though

@bekharsky
Copy link

Works like a charm. Thank you!

@TonyStark
Copy link

no words man
very very thank you :)

@ZenMagus
Copy link

this worked awesome -- thank you -- I'm currently studying Python so am super happy with this.

@dan400man
Copy link

I was finally able to retrieve my 3-year old attachment by using Gmail's "Save to Drive", as mentioned above. Interestingly, I couldn't download it from the Drive page (it just opened a blank page with a cryptic URL) in Firefox. I tried "sharing" it, sending a link to the file to my work Outlook email. Outlook opens IE here at work to navigate to the link and, after the obligatory advice to move to Chrome, I was able to download the file, an encrypted .7z file. It was all source code. At first, I couldn't figure out why I chose to encrypt the archive, since it was just source code. But then I saw that some of the file extensions were .CMD, which I seem to recall Gmail screaming about and not allowing.

7zip allows the file names to be encrypted, so that's how I got around that. Apparently, Google has decided that any archive whose file names are encrypted must be malware designed by Russian, Chinese, or North Korean hackers.

@heiniovason
Copy link

This script saved my ass. Thanks a bunch!

@akhileshdarjee
Copy link

Awesome, it works :)

@s-steephan
Copy link

Working fine. Thanks. !!

@Sneidon
Copy link

Sneidon commented Jun 18, 2019

Thanks, code works. If you are the one who sent the email, you can save the email and open using Outlook etc.

@stefansundin
Copy link
Author

stefansundin commented Sep 27, 2022

Made some updates:

  • Gmail now downloads these files as .eml and not .txt, so the script now discovers these files as well.
  • Simpler parsing by using policy.default. Unicode in subject lines should decode properly now.
  • Support for .mbox format. This is the format that Google Takeout gives you the emails in. (I used https://gist.github.com/benwattsjones/060ad83efd2b3afc8b229d41f9b246c4 as a reference.)
  • Automatic renaming of conflicting files (you have to opt in by setting automatic_rename = True).

Enjoy!

Anyone still using this? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment