First, you'll need to generate a list of job.id's from reports generated in the last # days. Run the following in Mixpanel...
SELECT job_id
FROM public.reports AS reports
WHERE reports.updated_at > '2015-06-29 00:00:00'
AND reports.kind = 1 /* Full Reports == 1 */
The size of reports can be obtained using Fog, in Make's production environment...
- SSH into make web, and open a console
- Create a new fog storage object...
storage = Fog::Storage.new(provider: 'AWS', aws_access_key_id: ENV.fetch('AWS_ACCESS_KEY_ID'), aws_secret_access_key: ENV.fetch('AWS_SECRET_ACCESS_KEY'))
- Copy and transform the job id's from Mixpanel into an array object that can be digested by the console
- Map the job id's with the following...
jobs.map{|job_id| [job_id, storage.get_object('crowdflower_prod', "f#{job_id}.csv.zip").headers["Content-Length"].to_i] }
- Copy the jobs array from the console, and transform it into a string resembling the following, and save it as a text file.
[1234, 1234], [1234, 1234], [1234, 1234]
- Run report_size_distro.rb against the text file