public
Last active

Start and Stop tasks for resque workers, with capistrano deploy hook (without God)

  • Download Gist
deploy.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
after "deploy:symlink", "deploy:restart_workers"
 
##
# Rake helper task.
# http://pastie.org/255489
# http://geminstallthat.wordpress.com/2008/01/27/rake-tasks-through-capistrano/
# http://ananelson.com/said/on/2007/12/30/remote-rake-tasks-with-capistrano/
def run_remote_rake(rake_cmd)
rake_args = ENV['RAKE_ARGS'].to_s.split(',')
cmd = "cd #{fetch(:latest_release)} && #{fetch(:rake, "rake")} RAILS_ENV=#{fetch(:rails_env, "production")} #{rake_cmd}"
cmd += "['#{rake_args.join("','")}']" unless rake_args.empty?
run cmd
set :rakefile, nil if exists?(:rakefile)
end
 
namespace :deploy do
desc "Restart Resque Workers"
task :restart_workers, :roles => :db do
run_remote_rake "resque:restart_workers"
end
end
resque.rake
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
# Start a worker with proper env vars and output redirection
def run_worker(queue, count = 1)
puts "Starting #{count} worker(s) with QUEUE: #{queue}"
ops = {:pgroup => true, :err => [(Rails.root + "log/resque_err").to_s, "a"],
:out => [(Rails.root + "log/resque_stdout").to_s, "a"]}
env_vars = {"QUEUE" => queue.to_s}
count.times {
## Using Kernel.spawn and Process.detach because regular system() call would
## cause the processes to quit when capistrano finishes
pid = spawn(env_vars, "rake resque:work", ops)
Process.detach(pid)
}
end
 
namespace :resque do
task :setup => :environment
 
desc "Restart running workers"
task :restart_workers => :environment do
Rake::Task['resque:stop_workers'].invoke
Rake::Task['resque:start_workers'].invoke
end
desc "Quit running workers"
task :stop_workers => :environment do
pids = Array.new
Resque.workers.each do |worker|
pids.concat(worker.worker_pids)
end
if pids.empty?
puts "No workers to kill"
else
syscmd = "kill -s QUIT #{pids.join(' ')}"
puts "Running syscmd: #{syscmd}"
system(syscmd)
end
end
desc "Start workers"
task :start_workers => :environment do
run_worker("*", 2)
run_worker("high", 1)
end
end

Hi!

I'm using this code to start my workers, and works great but I've receive the following capistrano output:

  • executing "cd /home/app/releases/20110512064921 && bundle exec rake RAILS_ENV=production resque:restart_workers" servers: ["..com"] [..com] executing command ** out :: ..com ** [out :: ..com] Running syscmd: kill -s QUIT 2616 2617 30415 ** [out :: ..com] Starting 1 worker(s) with QUEUE: * command finished [nil] # this is the output of the command failed: "sh -c 'cd /home/app/releases/20110512064921 && bundle exec rake RAILS_ENV=production resque:restart_workers'" on ..com

The exit code of the command is nil so capistrano thought that the command failed, but I've checked and the workers was started correctly

Do you have any idea of what happens?

Thanks in advance

Hi Paco,
What output do you get when you run "sh -c 'cd /home/app/releases/20110512064921 && bundle exec rake RAILS_ENV=production resque:restart_workers'" straight on the server?
There might be problems with the StdErr and StdOut output redirection to log/resque_err and log/resque_stdout done in line 4 of the resque.rake script.

These is the output, I don't notice anything wrong. I run only one worker in * queue

(in /home/app/releases/20110425142917)
Running syscmd: kill -s QUIT 2635 3992 3993
Starting 1 worker(s) with QUEUE: *

Hi there,

I am using Resque 1.15.0 and this script hangs the capistrano script when it it comes to starting the workers. After waiting a couple minutes, I have to control-c and rollback. This there something I have to do with an older script like this? I am also running 1.8.7 so I am using posix/spawn for emulating the spawn function.

I wanted to add a word of caution. The worker_pids method will find any process that has the term 'resque' in it. If you are using the resque namespace and doing a restart, then this will find the capistrano threads and kill them so the start_workers task will never be executed.

This is the command the Worker class uses to find non-Solaris pids:

ps -A -o pid,command | grep "[r]esque" | grep -v "resque-web"

i ran into the problem of the task killing itself before completion and returning a non successful status code too. seeing the post above, i decided to manually determine the pid of the workers, instead of using the built in resque method.

the following line, while not the cleanest code, does the job correctly.

pids = Array.new
`ps -A -o pid,command | grep "[r]esque" | grep -v "resque-web" | grep -v "restart_workers" | grep -v "stop_workers" | grep -v "start_workers"`.each_line do |l| 
  pids << l.to_i 
end

I did not know Resque used ps and grepped for the term "[r]esque". That seems quite brittle.

I haven't used this script in a while, and would probably use Foreman with a Procfile these days.

Simply, we could change one line of the :stop_workers task

Resque.workers.each do |worker|
      pids.concat(worker.worker_pids)
    end

to

Resque.workers.each do |worker|
      pids << worker.id.split(':')[1]
    end

It depends on the implementation of the to_s method of the Resque::Worker, but not the api. It's bad, but it works.

As I run into a case that could not fix by modifying the ps command:
I have two applications run in the same server, both of them have to use resque, by using the resque:restart_workers task, it will kill all the workers belong to both applications. And actually, I just want to kill the workers from one specify application.

Anyway, the best choice to solve this problem should be using something like 'god' or 'monit' to maintain the workers.

I ended up breaking a production server with this. Note:

Resque.workers.each do |worker|
  pids.concat(worker.worker_pids)
end

Does not distinguish queues. Each time I deployed it would kill ALL queues and restart its own.

In the short term I solved it with:

Resque.workers.each do |worker|
  pids.concat(worker.worker_pids) if worker.queues.include?(@queue_name)
end

In the long term I am going to look into Foreman, god, monit, or whatever to monitor and restart workers.

I found this worked best for me in the :stop_workers task:

workers = Resque.workers
workers.select! { |w| w.queues.include? queue } if queue
pids = workers.map { |w| w.to_s.sub /.+:(\d+):.+/, '\1' }

It's a combination of @kenniz's pid extraction technique (it is bad, but it's also used in parts of the resque code itself!), plus @kmcphillips's queue-specificity.

This slight mod to the regex accounts for processes with multiple (threaded) workers:

pids = workers.map { |w| w.to_s.sub /.+:(\d+)[-:].+/, '\1' }

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.