secret
Last active

Signal Handling with Resque and Heroku

  • Download Gist
.env
1 2 3
REDISCLOUD_URL=redis://localhost:6379
PORT=3000
RESQUE_TERM_TIMEOUT=10
Gemfile
Ruby
1 2 3 4
source "https://rubygems.org"
 
gem "resque"
gem "rake"
Procfile
1 2
web: bundle exec rackup -p $PORT
worker: env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work
README.md
Markdown

Signal Handling with Resque and Heroku

This is sample code to help people understand why Signal Handling works the way it does.

We'll be using Resque in the new signal handling mode to better work with POSIX signal handling. Let's set this up on Heroku.

$ git clone https://gist.github.com/54bc80701a83d2cb2913.git
$ heroku create
$ heroku addons:add heroku addons:add rediscloud:20
$ git push heroku master

When a dyno gets cycled due to a restart or having to be killed off Heroku will send a SIGTERM and then a SIGKILL 10 seconds later if the process hasn't been gracefully shutdown. You'll see a "R12 (Exit timeout)" in your logs if this happens. Now let's set Resque's SIGTERM timeout to demonstrate this.

$ heroku config:add RESQUE_TERM_TIMEOUT=10

We need to enqueue a job and cause a SIGTERM to be sent.

$ heroku run "bundle exec ruby enqueue.rb"
$ heroku restart

Now when running heroku logs -t -p worker you'll see something like this:

013-09-16T23:05:36.422618+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`
2013-09-16T23:05:38.249677+00:00 app[worker.1]: PID: 2
2013-09-16T23:05:38.432792+00:00 heroku[worker.1]: State changed from starting to up
2013-09-16T23:06:53.682989+00:00 app[worker.1]: Performing work: 2013-09-16 23:06:53 +0000
2013-09-16T23:06:53.682989+00:00 app[worker.1]: sleep 1
2013-09-16T23:06:54.686517+00:00 app[worker.1]: sleep 2
2013-09-16T23:06:56.687011+00:00 app[worker.1]: sleep 3
2013-09-16T23:06:59.688428+00:00 app[worker.1]: sleep 4
2013-09-16T23:07:03.688640+00:00 app[worker.1]: sleep 5
2013-09-16T23:07:08.689879+00:00 app[worker.1]: sleep 6
2013-09-16T23:07:11.180422+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2013-09-16T23:07:11.978975+00:00 app[worker.1]: ensure started 2013-09-16 23:07:11 +0000
2013-09-16T23:07:11.978975+00:00 app[worker.1]: sleep 1
2013-09-16T23:07:12.985010+00:00 app[worker.1]: sleep 2
2013-09-16T23:07:14.989352+00:00 app[worker.1]: sleep 3
2013-09-16T23:07:17.993877+00:00 app[worker.1]: sleep 4
2013-09-16T23:07:19.433624+00:00 heroku[worker.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
2013-09-16T23:07:19.433934+00:00 heroku[worker.1]: Stopping remaining processes with SIGKILL
2013-09-16T23:07:21.716823+00:00 heroku[worker.1]: Process exited with status 137

Notice the "R12" that happens. This is because the SIGTERM timeout takes longer than the 10 second allotted time by Heroku. "But we set it to 10!" That's a good point, but remember this is the time from SIGTERM to SIGKILL and in this time Resque and other code must run. Iwould recommend setting this to some value less than 10, like 5. This gives Resque enough time to clean up after itself as well.

Resque needs time to clean up when a SIGTERM happens and if it gets SIGKILL happens, it denies Resque the chance to do this. You'll notice stale workers in your Resque UI. If you deploy this code. And this is due to the fact that Resque can't clean up workers if the worker dies on a distributed system. If you try this code locally, you won't notice stale workers b/c prune_dead_workers works on a local box but not in a distributed environment like Heroku. Resque can't tell if a worker is alive or dead.

Now let's set the timeout to something more reasonable like 5.

$ heroku config:add RESQUE_TERM_TIMEOUT=5

Re-enquing the job and restarting:

$ heroku run "bundle exec ruby enqueue.rb"
$ heroku restart

Let's check our logs:

2013-09-16T23:53:54.348137+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`
2013-09-16T23:53:55.836592+00:00 heroku[worker.1]: State changed from starting to up
2013-09-16T23:53:55.897159+00:00 app[worker.1]: PID: 2
2013-09-16T23:54:36.031549+00:00 app[worker.1]: Performing work: 2013-09-16 23:54:36 +0000
2013-09-16T23:54:36.031549+00:00 app[worker.1]: sleep 1
2013-09-16T23:54:37.031835+00:00 app[worker.1]: sleep 2
2013-09-16T23:54:39.032235+00:00 app[worker.1]: sleep 3
2013-09-16T23:54:39.729718+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2013-09-16T23:54:40.527182+00:00 app[worker.1]: ensure started 2013-09-16 23:54:40 +0000
2013-09-16T23:54:40.527182+00:00 app[worker.1]: sleep 1
2013-09-16T23:54:41.527583+00:00 app[worker.1]: sleep 2
2013-09-16T23:54:43.528268+00:00 app[worker.1]: sleep 3
2013-09-16T23:54:44.404581+00:00 app[worker.1]: PID: 2
2013-09-16T23:54:41.985160+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`

Notice the "R12" error is gone and it doesn't add more stale workers. :)

Rakefile
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14
require 'resque'
require 'resque/tasks'
require './config_resque'
require './worker'
$stdout.sync = true
 
task :setup_resque do
File.open("resque.pid", 'w') do |file|
puts "PID: #{Process.pid}"
file.puts Process.pid
end
end
 
Rake::Task["resque:work"].enhance [:setup_resque]
config.ru
Ruby
1 2 3 4
require 'resque'
require 'resque/server'
 
run Resque::Server.new
config_resque.rb
Ruby
1 2 3 4
require 'resque'
 
Resque.logger.level = Logger::DEBUG
Resque.redis = ENV["REDISCLOUD_URL"]
enqueue.rb
Ruby
1 2 3 4 5 6 7
#!/usr/bin/env ruby
 
require 'resque'
require './config_resque'
require './worker'
 
Resque.enqueue(Worker)
enqueue_kill.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#!/usr/bin/env ruby
 
$stdout.sync = true
 
require 'resque'
require './config_resque'
require './worker'
 
pid = File.read("resque.pid").chomp.to_i
puts "Resque @ PID: #{pid}"
puts "Enqueue Worker job"
Resque.enqueue(Worker)
1.upto(3) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "TERM #{pid}"
Process.kill("TERM", pid)
1.upto(3) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "KILL #{pid}"
Process.kill("KILL", pid)
worker.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
class Worker
@queue = :worker
 
def self.perform
puts "Performing work: #{Time.now}"
1.upto(30) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "Done working: #{Time.now}"
rescue StandardError => e
puts "Rescued #{e.message}"
ensure
puts "ensure started #{Time.now}"
1.upto(60) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "ensure finished #{Time.now}"
end
end

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.