Skip to content

Instantly share code, notes, and snippets.

@k3muri84
Created September 14, 2016 12:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save k3muri84/f6fab3e926b927031cf47452f5689e31 to your computer and use it in GitHub Desktop.
Save k3muri84/f6fab3e926b927031cf47452f5689e31 to your computer and use it in GitHub Desktop.
python script to extract emails from a text file
import re
fileInput = 'file.htm'
fileOutput = 'emaillist-'+fileInput+'.txt'
f = open(fileInput)
content = f.read()
# email regex
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
# set makes them unique
results = set(regex.findall(content))
emails = ""
count = len(results)
for x in results:
emails += str(x[0])+"\n"
print("Reading " + fileInput + ":\n------------------------------------------")
print(emails)
print("unique user: " + str(count))
print("------------------------------------------")
# function to write file
def writefile():
f = open(fileOutput, 'w')
f.write(emails)
f.close()
print("File written: " + fileOutput)
writefile()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment