Instantly share code, notes, and snippets.

@hone /.env Secret
Last active Dec 23, 2015

What would you like to do?
Signal Handling with Resque and Heroku

Signal Handling with Resque and Heroku

This is sample code to help people understand why Signal Handling works the way it does.

We'll be using Resque in the new signal handling mode to better work with POSIX signal handling. Let's set this up on Heroku.

$ git clone
$ heroku create
$ heroku addons:add heroku addons:add rediscloud:20
$ git push heroku master

When a dyno gets cycled due to a restart or having to be killed off Heroku will send a SIGTERM and then a SIGKILL 10 seconds later if the process hasn't been gracefully shutdown. You'll see a "R12 (Exit timeout)" in your logs if this happens. Now let's set Resque's SIGTERM timeout to demonstrate this.

$ heroku config:add RESQUE_TERM_TIMEOUT=10

We need to enqueue a job and cause a SIGTERM to be sent.

$ heroku run "bundle exec ruby enqueue.rb"
$ heroku restart

Now when running heroku logs -t -p worker you'll see something like this:

013-09-16T23:05:36.422618+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`
2013-09-16T23:05:38.249677+00:00 app[worker.1]: PID: 2
2013-09-16T23:05:38.432792+00:00 heroku[worker.1]: State changed from starting to up
2013-09-16T23:06:53.682989+00:00 app[worker.1]: Performing work: 2013-09-16 23:06:53 +0000
2013-09-16T23:06:53.682989+00:00 app[worker.1]: sleep 1
2013-09-16T23:06:54.686517+00:00 app[worker.1]: sleep 2
2013-09-16T23:06:56.687011+00:00 app[worker.1]: sleep 3
2013-09-16T23:06:59.688428+00:00 app[worker.1]: sleep 4
2013-09-16T23:07:03.688640+00:00 app[worker.1]: sleep 5
2013-09-16T23:07:08.689879+00:00 app[worker.1]: sleep 6
2013-09-16T23:07:11.180422+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2013-09-16T23:07:11.978975+00:00 app[worker.1]: ensure started 2013-09-16 23:07:11 +0000
2013-09-16T23:07:11.978975+00:00 app[worker.1]: sleep 1
2013-09-16T23:07:12.985010+00:00 app[worker.1]: sleep 2
2013-09-16T23:07:14.989352+00:00 app[worker.1]: sleep 3
2013-09-16T23:07:17.993877+00:00 app[worker.1]: sleep 4
2013-09-16T23:07:19.433624+00:00 heroku[worker.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
2013-09-16T23:07:19.433934+00:00 heroku[worker.1]: Stopping remaining processes with SIGKILL
2013-09-16T23:07:21.716823+00:00 heroku[worker.1]: Process exited with status 137

Notice the "R12" that happens. This is because the SIGTERM timeout takes longer than the 10 second allotted time by Heroku. "But we set it to 10!" That's a good point, but remember this is the time from SIGTERM to SIGKILL and in this time Resque and other code must run. Iwould recommend setting this to some value less than 10, like 5. This gives Resque enough time to clean up after itself as well.

Resque needs time to clean up when a SIGTERM happens and if it gets SIGKILL happens, it denies Resque the chance to do this. You'll notice stale workers in your Resque UI. If you deploy this code. And this is due to the fact that Resque can't clean up workers if the worker dies on a distributed system. If you try this code locally, you won't notice stale workers b/c prune_dead_workers works on a local box but not in a distributed environment like Heroku. Resque can't tell if a worker is alive or dead.

Now let's set the timeout to something more reasonable like 5.

$ heroku config:add RESQUE_TERM_TIMEOUT=5

Re-enquing the job and restarting:

$ heroku run "bundle exec ruby enqueue.rb"
$ heroku restart

Let's check our logs:

2013-09-16T23:53:54.348137+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`
2013-09-16T23:53:55.836592+00:00 heroku[worker.1]: State changed from starting to up
2013-09-16T23:53:55.897159+00:00 app[worker.1]: PID: 2
2013-09-16T23:54:36.031549+00:00 app[worker.1]: Performing work: 2013-09-16 23:54:36 +0000
2013-09-16T23:54:36.031549+00:00 app[worker.1]: sleep 1
2013-09-16T23:54:37.031835+00:00 app[worker.1]: sleep 2
2013-09-16T23:54:39.032235+00:00 app[worker.1]: sleep 3
2013-09-16T23:54:39.729718+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2013-09-16T23:54:40.527182+00:00 app[worker.1]: ensure started 2013-09-16 23:54:40 +0000
2013-09-16T23:54:40.527182+00:00 app[worker.1]: sleep 1
2013-09-16T23:54:41.527583+00:00 app[worker.1]: sleep 2
2013-09-16T23:54:43.528268+00:00 app[worker.1]: sleep 3
2013-09-16T23:54:44.404581+00:00 app[worker.1]: PID: 2
2013-09-16T23:54:41.985160+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`

Notice the "R12" error is gone and it doesn't add more stale workers. :)

require 'resque'
require 'resque/server'
require 'resque'
Resque.logger.level = Logger::DEBUG
Resque.redis = ENV["REDISCLOUD_URL"]
#!/usr/bin/env ruby
require 'resque'
require './config_resque'
require './worker'
#!/usr/bin/env ruby
$stdout.sync = true
require 'resque'
require './config_resque'
require './worker'
pid ="").chomp.to_i
puts "Resque @ PID: #{pid}"
puts "Enqueue Worker job"
1.upto(3) do |i|
puts "sleep #{i}"
puts "TERM #{pid}"
Process.kill("TERM", pid)
1.upto(3) do |i|
puts "sleep #{i}"
puts "KILL #{pid}"
Process.kill("KILL", pid)
source ""
gem "resque"
gem "rake"
mono_logger (1.1.0)
multi_json (1.8.0)
rack (1.5.2)
rack-protection (1.5.0)
rake (10.1.0)
redis (3.0.4)
redis-namespace (1.3.1)
redis (~> 3.0.0)
resque (1.25.0)
mono_logger (~> 1.0)
multi_json (~> 1.0)
redis-namespace (~> 1.2)
sinatra (>= 0.9.2)
vegas (~> 0.1.2)
sinatra (1.4.3)
rack (~> 1.4)
rack-protection (~> 1.4)
tilt (~> 1.3, >= 1.3.4)
tilt (1.4.1)
vegas (0.1.11)
rack (>= 1.0.0)
web: bundle exec rackup -p $PORT
worker: env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work
require 'resque'
require 'resque/tasks'
require './config_resque'
require './worker'
$stdout.sync = true
task :setup_resque do"", 'w') do |file|
puts "PID: #{}"
Rake::Task["resque:work"].enhance [:setup_resque]
class Worker
@queue = :worker
def self.perform
puts "Performing work: #{}"
1.upto(30) do |i|
puts "sleep #{i}"
puts "Done working: #{}"
rescue StandardError => e
puts "Rescued #{e.message}"
puts "ensure started #{}"
1.upto(60) do |i|
puts "sleep #{i}"
puts "ensure finished #{}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment