Skip to content

Instantly share code, notes, and snippets.

@stefansundin
Last active September 27, 2022 18:54
Show Gist options
  • Star 66 You must be signed in to star a gist
  • Fork 23 You must be signed in to fork a gist
  • Save stefansundin/a99bbfb6cda873d14fd2 to your computer and use it in GitHub Desktop.
Save stefansundin/a99bbfb6cda873d14fd2 to your computer and use it in GitHub Desktop.
Extract attachments from emails that Gmail doesn't allow you to download. This is dumb. Please use Python >= 3.4.
#!/usr/bin/env python3
# Get your files that Gmail block. Warning message:
# "Anti-virus warning - 1 attachment contains a virus or blocked file. Downloading this attachment is disabled."
# Based on: https://spapas.github.io/2014/10/23/retrieve-gmail-blocked-attachments/
# Instructions:
# Go to your emails, click the arrow button in the top right, "Show original", then "Download Original".
# Move the files to the same directory as this program, then run it.
import sys
import os
import email
import mailbox
from email import policy
from email.parser import BytesParser
# To automatically rename conflicing files, change this to True: (it adds a prefix to the filename, see get_new_filename below)
automatic_rename = False
def get_new_filename(old_fn):
i = 1
while True:
new_fn = "%d-%s" % (i, old_fn)
if not os.path.isfile(new_fn):
return new_fn
i += 1
if __name__ == '__main__':
if sys.version_info[0] < 3:
print("Please use Python 3.")
sys.exit()
if len(sys.argv) < 2:
files = [f for f in os.listdir('.') if os.path.isfile(f) and f.endswith(('.txt', '.eml', '.mbox'))]
if len(files) == 0:
print("Please download the emails and put them in this directory with a .txt, .eml, or .mbox extension.")
sys.exit()
print("Here are the files in the current directory with .txt, .eml, or .mbox extension:")
print('\n'.join(files))
print()
print("Press enter to extract attachments from these files.")
input()
else:
files = sys.argv[1:]
for f in files:
print("Processing %s" % f)
if f.endswith('.mbox'):
messages = mailbox.mbox(f, factory=BytesParser(policy=policy.default).parse, create=False)
print("%d messages." % len(messages))
print()
for msg in messages:
print("Subject: %s" % msg['Subject'])
for pl in msg.get_payload():
if isinstance(pl, str):
# not sure if there's a better way to skip this junk data
continue
fn = pl.get_filename()
if fn:
print("Found %s" % fn)
if os.path.isfile(fn):
print("The file '%s' already exists!" % fn)
if automatic_rename:
fn = get_new_filename(fn)
print("Changed the filename to: %s" % fn)
else:
print("Press enter to overwrite.")
input()
open(fn, 'wb').write(pl.get_payload(decode=True))
print()
else:
msg = email.message_from_file(open(f), policy=policy.default)
print("Subject: %s" % msg['Subject'])
for pl in msg.get_payload():
fn = pl.get_filename()
if fn:
print("Found %s" % fn)
if os.path.isfile(fn):
print("The file '%s' already exists!" % fn)
if automatic_rename:
fn = get_new_filename(fn)
print("Changed the filename to: %s" % fn)
else:
print("Press enter to overwrite.")
input()
open(fn, 'wb').write(pl.get_payload(decode=True))
print()
@bekharsky
Copy link

Works like a charm. Thank you!

@TonyStark
Copy link

no words man
very very thank you :)

@ZenMagus
Copy link

this worked awesome -- thank you -- I'm currently studying Python so am super happy with this.

@dan400man
Copy link

I was finally able to retrieve my 3-year old attachment by using Gmail's "Save to Drive", as mentioned above. Interestingly, I couldn't download it from the Drive page (it just opened a blank page with a cryptic URL) in Firefox. I tried "sharing" it, sending a link to the file to my work Outlook email. Outlook opens IE here at work to navigate to the link and, after the obligatory advice to move to Chrome, I was able to download the file, an encrypted .7z file. It was all source code. At first, I couldn't figure out why I chose to encrypt the archive, since it was just source code. But then I saw that some of the file extensions were .CMD, which I seem to recall Gmail screaming about and not allowing.

7zip allows the file names to be encrypted, so that's how I got around that. Apparently, Google has decided that any archive whose file names are encrypted must be malware designed by Russian, Chinese, or North Korean hackers.

@heiniovason
Copy link

This script saved my ass. Thanks a bunch!

@akhileshdarjee
Copy link

Awesome, it works :)

@s-steephan
Copy link

Working fine. Thanks. !!

@Sneidon
Copy link

Sneidon commented Jun 18, 2019

Thanks, code works. If you are the one who sent the email, you can save the email and open using Outlook etc.

@stefansundin
Copy link
Author

stefansundin commented Sep 27, 2022

Made some updates:

  • Gmail now downloads these files as .eml and not .txt, so the script now discovers these files as well.
  • Simpler parsing by using policy.default. Unicode in subject lines should decode properly now.
  • Support for .mbox format. This is the format that Google Takeout gives you the emails in. (I used https://gist.github.com/benwattsjones/060ad83efd2b3afc8b229d41f9b246c4 as a reference.)
  • Automatic renaming of conflicting files (you have to opt in by setting automatic_rename = True).

Enjoy!

Anyone still using this? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment