Skip to content

Instantly share code, notes, and snippets.

@nigeljonez
Created July 18, 2019 00:36
Show Gist options
  • Save nigeljonez/ca5ff820a08517b71ef14c86429c0836 to your computer and use it in GitHub Desktop.
Save nigeljonez/ca5ff820a08517b71ef14c86429c0836 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
# -*- encoding : utf-8 -*-
# Copied code from mysociety/alaveteli, so LICENSE = https://github.com/mysociety/alaveteli/blob/develop/LICENSE.txt
# We want to avoid loading rails unless we need it, so we start by just loading the
# config file ourselves.
$alaveteli_dir = File.expand_path(File.join(File.dirname(__FILE__), '..'))
$:.push(File.join($alaveteli_dir, "commonlib", "rblib"))
load 'config.rb'
$:.push(File.join($alaveteli_dir, "lib"))
$:.push(File.join($alaveteli_dir, "lib", "mail_handler"))
load 'configuration.rb'
MySociety::Config.set_file(File.join($alaveteli_dir, 'config', 'general'), true)
MySociety::Config.load_default
require 'optparse'
filename = ""
OptionParser.new do |opts|
opts.banner = "Usage: example.rb [options]"
opts.on("-fNAME", "--filename=FILENAME", "Filename to process") do |fi|
filename = fi
end
end.parse!
pdf_file = open(filename).read()
uncompressed_text = AlaveteliExternalCommand.run("pdftk", "-", "output", "-", "uncompress", :stdin_string => pdf_file)
if uncompressed_text.blank?
puts "returns 'text'"
else
text = uncompressed_text.dup
text.gsub!(MySociety::Validate.email_find_regexp) do |email|
email.gsub(/[^@.]/, 'x')
end
if text != uncompressed_text
puts "Email Regexp Change"
end
text = uncompressed_text.dup
ascii_chars = text.gsub(/\0/, "")
emails = ascii_chars.scan(MySociety::Validate.email_find_regexp)
# Convert back to UCS-2, making a mask at the same time
emails.map! do |email|
#We want the ASCII representation of UCS-2
[email[0].encode('UTF-16LE').force_encoding('US-ASCII'),
email[0].gsub(/[^@.]/, 'x').encode('UTF-16LE').force_encoding('US-ASCII')]
end
# Now search and replace the UCS-2 email with the UCS-2 mask
emails.each do |email, mask|
text.gsub!(email, mask)
end
if text != uncompressed_text
puts "Email Regexp Change #2"
end
end
puts "All done"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment