The program below can take one or more plain text files as input. It works with python2 and python3.
Let's say we have two files that may contain email addresses:
- file_a.txt
foo bar
ok ideler.dennis@gmail.com sup
hey...user+123@example.com,wyd
hello world!
RESCHEDULE 2'OCLOCK WITH JEFF@AMAZON.COM FOR TOMORROW@3pm
- file_b.html
<html>
<body>
<ul>
<li><span class=pl-c>Dennis Ideler <ideler.dennis@gmail.com></span></li>
<li><span class=pl-c>Jane Doe <jdoe@example.com></span></li>
</ul>
</body>
</html>
To extract the email addresses, download the Python program and execute it on the command line with our files as input.
$ python extract_emails_from_text.py file_a.txt file_b.html
ideler.dennis@gmail.com
user+123@example.com
jeff@amazon.com
ideler.dennis@gmail.com
jdoe@example.com
Voila, it prints all found email addresses. Let's also remove the duplicates and sort the email addresses alphabetically.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq
ideler.dennis@gmail.com
jdoe@example.com
jeff@amazon.com
user+123@example.com
Looks good! Now let's save the results to a file.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq > emails.txt
P.S. The above commands for sorting and deduplicating are specific to shells on a UNIX-based machine (e.g. Linux or Mac). If you're using Windows, you can use PowerShell. For example
python extract_emails_from_text.py file_a.txt file_b.html | sort -unique
I have a problem.
![py](https://private-user-images.githubusercontent.com/47021247/287546953-14b3a0fc-175c-4a61-87e8-67507e64fef1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE2NjgzOTgsIm5iZiI6MTcyMTY2ODA5OCwicGF0aCI6Ii80NzAyMTI0Ny8yODc1NDY5NTMtMTRiM2EwZmMtMTc1Yy00YTYxLTg3ZTgtNjc1MDdlNjRmZWYxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIyVDE3MDgxOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU4YmI5OGVhZWE1MzlkM2YyMjcwMDcwNTM4YWUxNmNlMzkxODM2ZTBjMGEyNzEwMjBjMmFjOTI3NjRkZjU2NGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.6u9fYG4Ec9wEMg8PBAICOGXuzch2nY9m9ssLi6Qynyk)