Skip to content

Instantly share code, notes, and snippets.

@akcrono
Last active August 29, 2015 14:16
Show Gist options
  • Save akcrono/376904e565fd873e7b78 to your computer and use it in GitHub Desktop.
Save akcrono/376904e565fd873e7b78 to your computer and use it in GitHub Desktop.
class CustomMailExporter
require 'fileutils'
attr_accessor :service, :target_users, :target_start_date, :target_end_date,
:path, :errors, :filename_counter, :emails_found
# Dates should be in Date or DateTime format. Users should be an array.
def initialize(service, target_users, target_start_date, target_end_date)
@service = service
@target_users = target_users.map(&:downcase)
@target_start_date = target_start_date
@target_end_date = target_end_date
@path = "/mnt/#{service.id}/"
@errors = []
@filename_counter = 0
@emails_found = 0
end
def process
FileUtils.mkdir_p(path) unless File.directory?(path)
service.metadatum_class.find_each(service.id) do |datum|
if target_user?(datum.from) && date_in_range?(datum.date)
write_contents_to_file(datum)
@emails_found += 1
end
end
return true if errors.count == 0
end
def convert_to_email_address(from)
from.split("<").last.split(">").first
end
def target_user?(from)
target_users.include?(convert_to_email_address(from).downcase)
end
def date_in_range?(date)
date > target_start_date && date < target_end_date
end
def write_contents_to_file(datum)
begin
path_and_name = path
if datum.respond_to?(:content_filename) && datum.content_filename.present?
path_and_name += datum.content_filename.gsub(/[.<>:"\/\\|\?\*']/, "")
#use gsub for problem characters in subjects
else
path_and_name += filename_counter.to_s
filename_counter += 1
end
path_and_name += ".eml" unless path_and_name.include? ".eml"
File.open(path_and_name, 'wb') do |f|
datum.content { |chunk| f << chunk }
end
rescue => e
errors << [datum.key, e]
end
end
end
@ericalexander
Copy link

Also, create a folder under /mnt to be used for the path instead of /tmp. There's more space available under /mnt. I can help with this tomorrow if needed.

@ericalexander
Copy link

I would not do the check for datum.content.nil? on line 50. This will load all content into memory. We don't want that. Instead, we'll wind up with some empty files. We can clean these up on the command line after the export finishes.

@ericalexander
Copy link

Overall looks pretty good. Just a few comments. Nice work.

Copy link

ghost commented Feb 26, 2015

What is the relationship between service and target_users?

Copy link

ghost commented Feb 26, 2015

Rather than trying to extract the email address in convert_to_email_address, which could prove difficult if there are from fields that are formatted differently than you expect, you try creating regular expressions from each of the target_users and using them to do the comparison in the target_user? method. The result would look something like this:

  def initialize(service, target_users, target_start_date, target_end_date)
    # Other initialization code
    @target_users = target_users.map do |email_address|
      # Regexp.escape will properly escape any special characters in the email address, such as '.'
      # The second boolean argument tells Regexp that the expression should be case insensitive.
      Regexp.new(Regexp.escape(email_address), true)
    end
    # Other initialization code
  end

  def target_user?(from)
    # The === method is defined on the Regexp class and returns true if the other String, in this case 'from',
    # matches the regular expression.
    target_users.any?{ |regexp| regexp === from }
  end

For more info on regular expressions in Ruby, check out http://ruby-doc.org//core-2.1.1/Regexp.html

Copy link

ghost commented Feb 26, 2015

With regard to the date_in_range? method, you should make sure that you're dealing with consistent objects. The comment above your initialize method suggests that target_start_date and target_end_date should be Date or DateTime objects, but the GoogleMailDatum#date method returns a Time object. The differences between these types could lead to unexpected behavior when you try to make comparisons, as you do in date_in_range?. This script would be much more reliable and easier to work with if you just chose one type of Time object to use throughout the script. I would suggest using Time over Date or DateTime, since Time is the one used by the Datum class.

Copy link

ghost commented Feb 26, 2015

Also regarding date_in_range?, the comparison you're making, date > target_start_date && date < target_end_date, suggests that target_start_date and target_end_date are non-inclusive. Is this desirable?

Copy link

ghost commented Feb 26, 2015

Minor FYI, but S3Datum has a method for writing content to a file, see Concerns::S3Datum#write_content_to_file for details.

Copy link

ghost commented Feb 26, 2015

Regarding lines 46-52, I dont think that you need to worry about Datums missing or not responding to content_filename. The content_filename method is required by the S3Datum interface for files to be stored in S3. Without it, we wouldn't be able to store or fetch content for them anyway. Also, with regard to

path_and_name += datum.content_filename.gsub(/[.<>:"\/\\|\?\*']/, "")
#use gsub for problem characters in subjects

it seems like the GoogleMailDatum and GoogleMailRestModels::CannonicalDatum classes both use the message_id to generate the filename, not the subject, so the gsub call may not be completely necessary, though I guess it's possible that I'm missing something.

Copy link

ghost commented Feb 26, 2015

Overall, I think it's good. Nice job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment