Skip to content

Instantly share code, notes, and snippets.

@philandstuff
Last active June 17, 2021 16:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save philandstuff/689709ed8a051a41d217dfc4e368966c to your computer and use it in GitHub Desktop.
Save philandstuff/689709ed8a051a41d217dfc4e368966c to your computer and use it in GitHub Desktop.
Validate emails against GOV.UK Notify's validation algorithm

GOV.UK Notify email validator

This is a script that takes a CSV file containing email addresses from a dirty data source, and splits it into two CSVs, one containing valid email addresses, and the other invalid.

How to use

First, clone the alphagov/notifications-utils repository. Add this script to the root, and make it executable:

chmod +x validate-emails.py

Set up a virtual environment and install dependencies:

python3 -mvenv env
source env/bin/activate
pip install -r requirements.txt

Then run on your csv:

./validate-emails.py ./path/to/your.csv

By default, it checks a column named Email. If your email column has a different name, specify it like this:

./validate-emails.py ./path/to/your.csv --column="email-address"

If the input file is called foo.csv, valid records will be output into foo_clean.csv, and invalid records will go into foo_rejected.csv.

#!/usr/bin/env python
import argparse
import csv
from notifications_utils.recipients import validate_email_address, InvalidEmailError
def validate_emails():
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('filename', type=str,
help='filename of the csv file to parse')
parser.add_argument('--column', default='Email',
help='CSV column heading of the email address column')
args = parser.parse_args()
outfilename = args.filename.replace('.csv','_clean.csv')
rejectfilename = args.filename.replace('.csv','_rejected.csv')
with open(args.filename) as csvfile:
with open(outfilename, 'w') as outfile:
with open(rejectfilename, 'w') as rejectfile:
csvreader = csv.DictReader(csvfile)
csvwriter = csv.DictWriter(outfile, csvreader.fieldnames)
rejectwriter = csv.DictWriter(rejectfile, csvreader.fieldnames)
csvwriter.writeheader()
rejectwriter.writeheader()
for row in csvreader:
email = row[args.column]
try:
validate_email_address(email)
csvwriter.writerow(row)
except InvalidEmailError:
rejectwriter.writerow(row)
if __name__ == "__main__":
validate_emails()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment