Skip to content

Instantly share code, notes, and snippets.

@akcrono
Last active August 29, 2015 14:16
Show Gist options
  • Save akcrono/376904e565fd873e7b78 to your computer and use it in GitHub Desktop.
Save akcrono/376904e565fd873e7b78 to your computer and use it in GitHub Desktop.
class CustomMailExporter
require 'fileutils'
attr_accessor :service, :target_users, :target_start_date, :target_end_date,
:path, :errors, :filename_counter, :emails_found
# Dates should be in Date or DateTime format. Users should be an array.
def initialize(service, target_users, target_start_date, target_end_date)
@service = service
@target_users = target_users.map(&:downcase)
@target_start_date = target_start_date
@target_end_date = target_end_date
@path = "/mnt/#{service.id}/"
@errors = []
@filename_counter = 0
@emails_found = 0
end
def process
FileUtils.mkdir_p(path) unless File.directory?(path)
service.metadatum_class.find_each(service.id) do |datum|
if target_user?(datum.from) && date_in_range?(datum.date)
write_contents_to_file(datum)
@emails_found += 1
end
end
return true if errors.count == 0
end
def convert_to_email_address(from)
from.split("<").last.split(">").first
end
def target_user?(from)
target_users.include?(convert_to_email_address(from).downcase)
end
def date_in_range?(date)
date > target_start_date && date < target_end_date
end
def write_contents_to_file(datum)
begin
path_and_name = path
if datum.respond_to?(:content_filename) && datum.content_filename.present?
path_and_name += datum.content_filename.gsub(/[.<>:"\/\\|\?\*']/, "")
#use gsub for problem characters in subjects
else
path_and_name += filename_counter.to_s
filename_counter += 1
end
path_and_name += ".eml" unless path_and_name.include? ".eml"
File.open(path_and_name, 'wb') do |f|
datum.content { |chunk| f << chunk }
end
rescue => e
errors << [datum.key, e]
end
end
end
Copy link

ghost commented Feb 26, 2015

With regard to the date_in_range? method, you should make sure that you're dealing with consistent objects. The comment above your initialize method suggests that target_start_date and target_end_date should be Date or DateTime objects, but the GoogleMailDatum#date method returns a Time object. The differences between these types could lead to unexpected behavior when you try to make comparisons, as you do in date_in_range?. This script would be much more reliable and easier to work with if you just chose one type of Time object to use throughout the script. I would suggest using Time over Date or DateTime, since Time is the one used by the Datum class.

Copy link

ghost commented Feb 26, 2015

Also regarding date_in_range?, the comparison you're making, date > target_start_date && date < target_end_date, suggests that target_start_date and target_end_date are non-inclusive. Is this desirable?

Copy link

ghost commented Feb 26, 2015

Minor FYI, but S3Datum has a method for writing content to a file, see Concerns::S3Datum#write_content_to_file for details.

Copy link

ghost commented Feb 26, 2015

Regarding lines 46-52, I dont think that you need to worry about Datums missing or not responding to content_filename. The content_filename method is required by the S3Datum interface for files to be stored in S3. Without it, we wouldn't be able to store or fetch content for them anyway. Also, with regard to

path_and_name += datum.content_filename.gsub(/[.<>:"\/\\|\?\*']/, "")
#use gsub for problem characters in subjects

it seems like the GoogleMailDatum and GoogleMailRestModels::CannonicalDatum classes both use the message_id to generate the filename, not the subject, so the gsub call may not be completely necessary, though I guess it's possible that I'm missing something.

Copy link

ghost commented Feb 26, 2015

Overall, I think it's good. Nice job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment