Skip to content

Instantly share code, notes, and snippets.

@mjbommar mjbommar/ Secret
Created Nov 27, 2009

What would you like to do?
@date Nov 27, 2009
@author: Michael Bommarito
This is provided for purely academic purposes.
import glob, re, datetime, dateutil.parser
reAddress = re.compile('([^ <,]+@[^ >,]+)', re.IGNORECASE)
def parseEmail(buffer):
Parse the email text.
Return the sender, recipients, and date.
# Read in the email and properly split it into lines.
lines = buffer.strip().splitlines()
Parse the From address with a regular expression
and raise an exception if the regular expression fails.
senderAddress = reAddress.findall(lines[0])[0].lower()
raise Exception("%s has invalid From: header." % email)
Now try to parse recipient address from the
the To:, Cc:, and Bcc: lines. Start from the second line
and iterate until the header ends, which is denoted by
an empty line.
recipientAddress = set()
for i in range(1, len(lines)):
# Clean up the line, just in case.
leftToken = lines[i].strip().lower()
# This means the header has ended and we should stop looking for addresses.
if len(leftToken) == 0:
If this is the date line, parse that.
Note that FOIA/mail/0853426848.txt has a goofy NZ timestamp with
+-1300. I changed this to +1300.
if leftToken.startswith('date:'):
dateString = lines[i].split(':', 1)[1]
dateString = dateString.split('(')[0].replace(',','').strip()
emailDate = dateutil.parser.parse(dateString)
raise Exception("bad datestring detected - %s" % (dateString))
# Now if the header indicates recipients, parse the addresses from it.
if leftToken.startswith('to:') or leftToken.startswith('cc:') or leftToken.startswith('bcc'):
lineAddress = [address.lower() for address in reAddress.findall(lines[i])]
recipientAddress = [address.strip() for address in recipientAddress if len(address.strip()) > 0]
return (senderAddress,sorted(list(recipientAddress)), emailDate)
for email in glob.glob('FOIA/mail/*.txt'):
# Parse the email into the addresses
senderAddress, recipientAddress, emailDate = parseEmail(open(email).read())
Check to make sure someone didn't send the email without an actual address.
I have gone through and manually corrected these files by finding the matching
address from elsewhere in the database or from the Internet.
Mike Hulme needs compliant mail client!!!
* FOIA/mail/0837197800.txt: Neil Loader <>
* FOIA/mail/0973374325.txt: barker <>
* FOIA/mail/0973374325.txt: vira <>
* FOIA/mail/0942448792.txt: wigley <>
* FOIA/mail/0957536665.txt: t.d.davies <>
* FOIA/mail/0986486371.txt: s.torok <>
* FOIA/mail/1006983600.txt: All BCC? Removed
* FOIA/mail/1021757151.txt: s.torok <>
* FOIA/mail/1048799107.txt: All BCC? Removed
* FOIA/mail/1164120712.txt: Malcolm Hughes <>
if len(recipientAddress) == 0:
raise Exception("%s has no recipient - something is wrong." % (email))
print "%s,%s,%s,%s" % (email, emailDate, senderAddress, ','.join(recipientAddress))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.