Skip to content

Instantly share code, notes, and snippets.

@DanielBlanco
Created January 18, 2017 19:57
Show Gist options
  • Save DanielBlanco/4f7a6169b6a7febf451bc4d862aabdbf to your computer and use it in GitHub Desktop.
Save DanielBlanco/4f7a6169b6a7febf451bc4d862aabdbf to your computer and use it in GitHub Desktop.
Checks email content differences.
#!/usr/bin/env ruby
# See https://workflow.advisory.com/browse/CAM-11198
if __FILE__ != $0
exit 0
end
$stderr.sync = true
require "optparse"
usage = %{
Usage: ruby script/CAM-11198.rb
Output data for a sample group of 1000 emails with HTML content, the idea is to
compare sanitized to unsanitezed HTML.
Options:
-h, --help Shows usage information
}
file_name = "CAM-11198-output-diff.html"
ARGV.options do |opts|
opts.on_tail("-h", "--help") do
warn usage
exit 1
end
opts.parse!
end
require "bundler/setup"
require File.expand_path("../../config/environment", __FILE__)
ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Base.clear_reloadable_connections!
begin
output = File.open( file_name , "w" )
output << <<-HTML
<html>
<head>
<title>CAM-11198 diff</title>
<style>
#{Diffy::CSS_COLORBLIND_1}
.message{
border: 2px solid #000;
margin-bottom: 10px;
}
p {
border-bottom: 1px solid #777;
margin: 0;
background: #ddd;
padding: 10px;
}
</style>
</head>
<body>
HTML
# Get the message list and compare.
messages = GradesFirst::Message.all(
select: "id, message",
conditions: [
"message LIKE ? AND created_at < ?",
"%>%",
Date.today.beginning_of_year
],
order: "created_at",
limit: 1000
)
output << "nothing to compare!" if messages.empty?
messages.each do |msg|
sanitized_message = Sanitize.fragment(msg.message, Sanitize::Config::RELAXED)
output << '<div class="message">'
output << "<p>Message ID: #{msg.id}</p>"
output << Diffy::Diff.new(msg.message, sanitized_message).to_s(:html)
output << '</div>'
end
ensure
output << "</body></html>"
output.close
end
@danielrehner
Copy link

  1. Should the condition be created_at > ?
  2. if there is no diff for a given message, don't output anything.
  3. May be better to pull use find_each and iterate one at a time rather than pulling all the messages back into memory at once. Order may not be necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment