Skip to content

Instantly share code, notes, and snippets.

@jlecour
Created October 10, 2011 20:33
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save jlecour/1276437 to your computer and use it in GitHub Desktop.
Save jlecour/1276437 to your computer and use it in GitHub Desktop.
Identify paperclip attachment files that are not attached to any record

Let's say you have a model, with an files attached, using Paperclip. You have a couple millions of those files and you're not sure that every one of them (and all its thumbnails) are still used by a database record.

You could use this rake task to recursively scan all the directories and check if the files need to be kept or destroyed.

In this example, the model is called Picture, the attachment is image and the path is partitioned like images/001/412/497/actual_file.jpg

The task is going down the path. Each time the path ends with 3 triplets of digits ("001/412/497" for example) it looks for a record with the ID 1412497. If such a record doesn't exist, the whole directory is moved to a parallel images_deleted directory. At the end you can delete the files if you like, or move them to an archive location.

You can use the "dry run" mode : to print which files would be removed

rake paperclip:clean_orphan_files DRY_RUN=1

You'd get a line for each orphan attachment with it's ID. You can also put this into a file for latter inspection

rake paperclip:clean_orphan_files DRY_RUN=1 > clean_orphan_files.out

If you think you've made a huge mistake, you can revert this :

cp -r image_deleted/* images/
rmdir images_deleted

and you'll be back to normal.

NB : this code has run on a "production" server without any issue, but it's not tested with automated tests. For the moment it's still a couple of methods in a rake task. It's really benefit being extracted into a class. It's also not particularly well coded. I'm pretty sure some parts could really be improved, made more readable. Feel free to comment.

namespace :paperclip do
desc "Destroy paperclip attachment files that are not attached to any record"
task :clean_orphan_files => :environment do
@last_path = nil
@dry_run = %w(true 1).include? ENV['DRY_RUN']
Signal.trap('USR1') do
puts "#{Time.now.strftime('%Y-%m-%d %H:%M:%S')} #{@last_path}"
end
def reverse_id_partition(path)
parts = path.to_s.split('/')[-3..-1]
if parts.all? { |e| e =~ /^\d{3}$/}
parts.join.to_i
end
end
def is_orphan?(model, id)
!model.exists?(id)
end
def move_to_deleted_directory(old_path)
parts = old_path.to_s.split('/')
if parts.include?('images')
new_dir = old_path.to_s.gsub(/\bimages\b/,'images_deleted')
new_path = Pathname.new(new_dir)
new_path.mkpath
old_path.rename new_path
end
end
def delete_dir_if_empty(dir)
if dir.children.none? { |e| (e.file? && e.extname != '') || e.directory? }
if @dry_run
puts "delete #{dir}"
else
dir.rmtree
end
end
end
def move_dir_if_orphan(dir, model)
id = reverse_id_partition(dir)
if id && is_orphan?(model, id)
if @dry_run
puts "#{model}##{id} : orphan"
else
move_to_deleted_directory(dir)
end
end
end
def verify_directory(start_dir, model)
@last_path = start_dir.to_s
if start_dir.children.none? {|e| e.directory?}
move_dir_if_orphan(start_dir, model)
else
start_dir.children.sort.each do |entry|
full_path = (start_dir + entry)
if full_path.directory?
verify_directory(full_path, model)
end
end
delete_dir_if_empty(start_dir)
end
end
start_dir = Pathname.new(Picture::PAPERCLIP_BASEDIR + 'images')
verify_directory(start_dir, Picture)
end
end
class Picture < ActiveRecord::Base
PAPERCLIP_BASEDIR = Rails.root + 'public/system'
PAPERCLIP_PATH = ":attachment/:id_partition/:basename-:style.:extension"
has_attached_file :image,
:storage => :filesystem,
:url => "/system/#{PAPERCLIP_PATH}",
:path => "#{PAPERCLIP_BASEDIR.to_s}/#{PAPERCLIP_PATH}"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment