The program below can take one or more plain text files as input. It works with python2 and python3.
Let's say we have two files that may contain email addresses:
- file_a.txt
foo bar
ok ideler.dennis@gmail.com sup
hey...user+123@example.com,wyd
hello world!
RESCHEDULE 2'OCLOCK WITH JEFF@AMAZON.COM FOR TOMORROW@3pm
- file_b.html
<html>
<body>
<ul>
<li><span class=pl-c>Dennis Ideler <ideler.dennis@gmail.com></span></li>
<li><span class=pl-c>Jane Doe <jdoe@example.com></span></li>
</ul>
</body>
</html>
To extract the email addresses, download the Python program and execute it on the command line with our files as input.
$ python extract_emails_from_text.py file_a.txt file_b.html
ideler.dennis@gmail.com
user+123@example.com
jeff@amazon.com
ideler.dennis@gmail.com
jdoe@example.com
Voila, it prints all found email addresses. Let's also remove the duplicates and sort the email addresses alphabetically.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq
ideler.dennis@gmail.com
jdoe@example.com
jeff@amazon.com
user+123@example.com
Looks good! Now let's save the results to a file.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq > emails.txt
P.S. The above commands for sorting and deduplicating are specific to shells on a UNIX-based machine (e.g. Linux or Mac). If you're using Windows, you can use PowerShell. For example
python extract_emails_from_text.py file_a.txt file_b.html | sort -unique
@parable I'm not exactly sure what you're asking. Can you rephrase your question?
If you have an email address like
someone@example.com
, do you just want theexample.com
part?