Skip to content

Instantly share code, notes, and snippets.

@CharlieSu
Created November 17, 2010 18:16
Show Gist options
  • Save CharlieSu/703756 to your computer and use it in GitHub Desktop.
Save CharlieSu/703756 to your computer and use it in GitHub Desktop.
require "ftools"
require 'digest/md5'
EmailAddress = begin
qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]'
dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]'
atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-' +
'\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+'
quoted_pair = '\\x5c[\\x00-\\x7f]'
domain_literal = "\\x5b(?:#{dtext}|#{quoted_pair})*\\x5d"
quoted_string = "\\x22(?:#{qtext}|#{quoted_pair})*\\x22"
domain_ref = atom
sub_domain = "(?:#{domain_ref}|#{domain_literal})"
word = "(?:#{atom}|#{quoted_string})"
domain = "#{sub_domain}(?:\\x2e#{sub_domain})*"
local_part = "#{word}(?:\\x2e#{word})*"
addr_spec = "#{local_part}\\x40#{domain}"
pattern = /\A#{addr_spec}\z/
end
file = File.open('google.csv')
file.each_line do |line|
line.split(',').each do |field|
normalized_line = field.downcase.strip
puts "#{Digest::MD5.hexdigest(normalized_line)}" if normalized_line.match(EmailAddress)
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment