Skip to content

Instantly share code, notes, and snippets.

@rafaelgaspar
Last active March 22, 2024 21:18
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rafaelgaspar/5bcae4842a97c0b6b8d64a20d5151f89 to your computer and use it in GitHub Desktop.
Save rafaelgaspar/5bcae4842a97c0b6b8d64a20d5151f89 to your computer and use it in GitHub Desktop.
Remove duplicate jobs from a Resque queue.
queues = %w{cache contents courses export_file mailer manager notifications ranking report statused subscriptions users}
queues.collect do |queue|
Thread.new do
key = "queue:#{queue}"
if Resque.redis.exists(key)
members = Resque.redis.lrange(key, 0, -1)
duplicate_members = members.group_by { |e| e }.select { |k, v| v.size > 1 }
puts "#{key}: has #{duplicate_members.count} duplicates"
if duplicate_members.count > 0
duplicate_members.each do |duplicate_member|
print '.'
(duplicate_member.last.count - 1).times do
Resque.redis.lrem(key, -1, duplicate_member.last.first)
end
end
puts ''
end
end
end
end.map(&:join)
key = 'failed'
members = Resque.redis.lrange(key, 0, -1); nil
duplicate_members = members.group_by { |e| JSON.load(e)['payload'].to_json }.select { |k, v| v.size > 1 }; nil
puts "#{key}: has #{duplicate_members.count} duplicates"
if duplicate_members.count > 0
threads = []
duplicate_members.to_a.in_groups_of(16000).each do |group_members|
threads << Thread.new do
group_members.each do |duplicate_member|
print '.'
(duplicate_member.last.count - 1).times do
Resque.redis.lrem(key, -1, duplicate_member.last.first)
end
end
end
end
threads.map(&:join)
puts ""
end
Resque.workers.each(&:prune_dead_workers).map(&:unregister_worker)

remove_duplicates.rb

Remove duplicate Resque jobs that have already been queued for processing.

Execute

To run this against a Rails installtion in production use the following command: bundle exec rails runner -e production /path/to/script/remove_duplicates.rb. See rails runner for more information.

Notes

Ideally no workers should be running while this script is running. This is NOT meant to be a real-time solution. That is, this script is not meant to be run continuously to fix an issue where a queue constantly injects duplicates into the queue. If that is the problem you are encountering please see the resque-loner gem. This is meant to be a 1 time solution for a situation where a large queue was manually created and would be difficult to recreate again without the duplicates.

Caveats

This is not a great method for removing duplicates from a queue; and could possibly take a significant amount of time on a queue with >1M records. But it will eventually de-dup the queue. See: Removing Duplicates from a List.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment