Create a gist now

Instantly share code, notes, and snippets.

Embed
A python script for extracting email addresses from text files. You can pass it multiple files. It prints the email addresses to stdout, one address per line. For ease of use, remove the .py extension and place it in your $PATH (e.g. /usr/local/bin/) to run it like a built-in command.
#!/usr/bin/env python
#
# Extracts email addresses from one or more plain text files.
#
# Notes:
# - Does not save to file (pipe the output to a file if you want it saved).
# - Does not check for duplicates (which can easily be done in the terminal).
#
# (c) 2013 Dennis Ideler <ideler.dennis@gmail.com>
from optparse import OptionParser
import os.path
import re
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
def file_to_str(filename):
"""Returns the contents of filename as a string."""
with open(filename) as f:
return f.read().lower() # Case is lowered to prevent regex mismatches.
def get_emails(s):
"""Returns an iterator of matched emails found in string s."""
# Removing lines that start with '//' because the regular expression
# mistakenly matches patterns like 'http://foo@bar.com' as '//foo@bar.com'.
return (email[0] for email in re.findall(regex, s) if not email[0].startswith('//'))
if __name__ == '__main__':
parser = OptionParser(usage="Usage: python %prog [FILE]...")
# No options added yet. Add them here if you ever need them.
options, args = parser.parse_args()
if not args:
parser.print_usage()
exit(1)
for arg in args:
if os.path.isfile(arg):
for email in get_emails(file_to_str(arg)):
print email
else:
print '"{}" is not a file.'.format(arg)
parser.print_usage()
@parable

This comment has been minimized.

Show comment
Hide comment
@parable

parable May 2, 2013

@dideler
So This is great, being new to python and not very good at writing regex, say I had a file containing lot's of e.mails not addresses but actual e.mails with html mark up and lot's of good stuff.
But all I want is the domain -> "From:" e.mail address of each e.mail ...

parable commented May 2, 2013

@dideler
So This is great, being new to python and not very good at writing regex, say I had a file containing lot's of e.mails not addresses but actual e.mails with html mark up and lot's of good stuff.
But all I want is the domain -> "From:" e.mail address of each e.mail ...

@dideler

This comment has been minimized.

Show comment
Hide comment
@dideler

dideler May 10, 2013

@parable I'm not exactly sure what you're asking. Can you rephrase your question?

If you have an email address like someone@example.com, do you just want the example.com part?

Owner

dideler commented May 10, 2013

@parable I'm not exactly sure what you're asking. Can you rephrase your question?

If you have an email address like someone@example.com, do you just want the example.com part?

@diek

This comment has been minimized.

Show comment
Hide comment
@diek

diek Jun 4, 2014

works very nicely

diek commented Jun 4, 2014

works very nicely

@MarioMey

This comment has been minimized.

Show comment
Hide comment

Thanks!

@sp3234

This comment has been minimized.

Show comment
Hide comment
@sp3234

sp3234 Aug 31, 2015

Can I extract fields From and their correspodning To with this code

sp3234 commented Aug 31, 2015

Can I extract fields From and their correspodning To with this code

@remoharsono

This comment has been minimized.

Show comment
Hide comment
@remoharsono

remoharsono Nov 2, 2015

work nice, thanks :)

work nice, thanks :)

@gauthamzz

This comment has been minimized.

Show comment
Hide comment
@gauthamzz

gauthamzz Mar 3, 2016

you saved a lot of stupid work

you saved a lot of stupid work

@DerekChia

This comment has been minimized.

Show comment
Hide comment
@DerekChia

DerekChia May 2, 2016

This is genius, thanks!

This is genius, thanks!

@Holly-L

This comment has been minimized.

Show comment
Hide comment
@Holly-L

Holly-L Jun 16, 2016

@dideler
Great work here. Thank you! Just wondering why you didn't use \w (the metacharacter for word characters) in the regex instead of [a-z0-9]?

Holly-L commented Jun 16, 2016

@dideler
Great work here. Thank you! Just wondering why you didn't use \w (the metacharacter for word characters) in the regex instead of [a-z0-9]?

@pierangelo1982

This comment has been minimized.

Show comment
Hide comment
@pierangelo1982

pierangelo1982 Jul 7, 2016

so useful... thanks!

so useful... thanks!

@mathiasvanderbrempt

This comment has been minimized.

Show comment
Hide comment
@mathiasvanderbrempt

mathiasvanderbrempt Jan 8, 2017

I can't really make out where it pulls the txt file in? what are the variables that define which file is being converted to a string? terminal just returns Usage: python email.py [FILE]...

I can't really make out where it pulls the txt file in? what are the variables that define which file is being converted to a string? terminal just returns Usage: python email.py [FILE]...

@baxeico

This comment has been minimized.

Show comment
Hide comment
@baxeico

baxeico Mar 7, 2017

Save in a file get_emails.py, then chmod +x get_emails.py. Then use like this to remove duplicate email addresses:

./get_emails.py file_to_parse.txt | sort | uniq

baxeico commented Mar 7, 2017

Save in a file get_emails.py, then chmod +x get_emails.py. Then use like this to remove duplicate email addresses:

./get_emails.py file_to_parse.txt | sort | uniq

@fredericpierron

This comment has been minimized.

Show comment
Hide comment
@fredericpierron

fredericpierron May 2, 2017

if the email starts with a ' like 'john@domain.com', it does not trim it.

if the email starts with a ' like 'john@domain.com', it does not trim it.

@fredericpierron

This comment has been minimized.

Show comment
Hide comment
@Chandraguptha

This comment has been minimized.

Show comment
Hide comment
@Chandraguptha

Chandraguptha Jun 1, 2017

Hello Dideler,

I am looking for Email Data Extractor

From Selected Folder
From Selected dates between
The data information i need between selected From & To emails only

Kindly share the script for same, so i can use it in Google spread sheet to track those mails data for my daily use

Hello Dideler,

I am looking for Email Data Extractor

From Selected Folder
From Selected dates between
The data information i need between selected From & To emails only

Kindly share the script for same, so i can use it in Google spread sheet to track those mails data for my daily use

@kurianbenoy

This comment has been minimized.

Show comment
Hide comment

Thank You

@wrystal

This comment has been minimized.

Show comment
Hide comment
@wrystal

wrystal Oct 6, 2017

There are small amount of wrong matching cases,such as:
online at www.amazon.com
A kind of stupid way is to adjust it is:

pattern = re.compile("([a-z0-9!#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
                    "{|}~-]+)*(@)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.))+[a-z0-9]"
                    "(?:[a-z0-9-]*[a-z0-9])?)|([a-z0-9!#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
                    "{|}~-]+)*(\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\sdot\s))+[a-z0-9]"
                    "(?:[a-z0-9-]*[a-z0-9])?)",re.S)

wrystal commented Oct 6, 2017

There are small amount of wrong matching cases,such as:
online at www.amazon.com
A kind of stupid way is to adjust it is:

pattern = re.compile("([a-z0-9!#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
                    "{|}~-]+)*(@)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.))+[a-z0-9]"
                    "(?:[a-z0-9-]*[a-z0-9])?)|([a-z0-9!#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
                    "{|}~-]+)*(\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\sdot\s))+[a-z0-9]"
                    "(?:[a-z0-9-]*[a-z0-9])?)",re.S)
@samayia

This comment has been minimized.

Show comment
Hide comment
@samayia

samayia Nov 23, 2017

Emails within placeholders should be remove.

e.g. placeholder="your@email.com"

This will generally give dummy and wanted value.

samayia commented Nov 23, 2017

Emails within placeholders should be remove.

e.g. placeholder="your@email.com"

This will generally give dummy and wanted value.

@glunardi

This comment has been minimized.

Show comment
Hide comment
@glunardi

glunardi Dec 12, 2017

Thanks a bunch, you just saved me 30 minutes! Merci beaucoup!

Thanks a bunch, you just saved me 30 minutes! Merci beaucoup!

@futzlarson

This comment has been minimized.

Show comment
Hide comment

Awesome.

@siafsadki

This comment has been minimized.

Show comment
Hide comment
@siafsadki

siafsadki Mar 12, 2018

so useful... thanks bro :)

so useful... thanks bro :)

@Sreevalli535

This comment has been minimized.

Show comment
Hide comment
@Sreevalli535

Sreevalli535 May 21, 2018

Can I extract fields From and their correspodning To with this code

Can I extract fields From and their correspodning To with this code

@Sreevalli535

This comment has been minimized.

Show comment
Hide comment
@Sreevalli535

Sreevalli535 May 21, 2018

Can I extract fields From and their corresponding To with this code

Can I extract fields From and their corresponding To with this code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment