Skip to content

Instantly share code, notes, and snippets.

@hone

hone/.env Secret

Last active December 23, 2015 05:29
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save hone/54bc80701a83d2cb2913 to your computer and use it in GitHub Desktop.
Save hone/54bc80701a83d2cb2913 to your computer and use it in GitHub Desktop.
Signal Handling with Resque and Heroku
REDISCLOUD_URL=redis://localhost:6379
PORT=3000
RESQUE_TERM_TIMEOUT=10

Signal Handling with Resque and Heroku

This is sample code to help people understand why Signal Handling works the way it does.

We'll be using Resque in the new signal handling mode to better work with POSIX signal handling. Let's set this up on Heroku.

$ git clone https://gist.github.com/54bc80701a83d2cb2913.git
$ heroku create
$ heroku addons:add heroku addons:add rediscloud:20
$ git push heroku master

When a dyno gets cycled due to a restart or having to be killed off Heroku will send a SIGTERM and then a SIGKILL 10 seconds later if the process hasn't been gracefully shutdown. You'll see a "R12 (Exit timeout)" in your logs if this happens. Now let's set Resque's SIGTERM timeout to demonstrate this.

$ heroku config:add RESQUE_TERM_TIMEOUT=10

We need to enqueue a job and cause a SIGTERM to be sent.

$ heroku run "bundle exec ruby enqueue.rb"
$ heroku restart

Now when running heroku logs -t -p worker you'll see something like this:

013-09-16T23:05:36.422618+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`
2013-09-16T23:05:38.249677+00:00 app[worker.1]: PID: 2
2013-09-16T23:05:38.432792+00:00 heroku[worker.1]: State changed from starting to up
2013-09-16T23:06:53.682989+00:00 app[worker.1]: Performing work: 2013-09-16 23:06:53 +0000
2013-09-16T23:06:53.682989+00:00 app[worker.1]: sleep 1
2013-09-16T23:06:54.686517+00:00 app[worker.1]: sleep 2
2013-09-16T23:06:56.687011+00:00 app[worker.1]: sleep 3
2013-09-16T23:06:59.688428+00:00 app[worker.1]: sleep 4
2013-09-16T23:07:03.688640+00:00 app[worker.1]: sleep 5
2013-09-16T23:07:08.689879+00:00 app[worker.1]: sleep 6
2013-09-16T23:07:11.180422+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2013-09-16T23:07:11.978975+00:00 app[worker.1]: ensure started 2013-09-16 23:07:11 +0000
2013-09-16T23:07:11.978975+00:00 app[worker.1]: sleep 1
2013-09-16T23:07:12.985010+00:00 app[worker.1]: sleep 2
2013-09-16T23:07:14.989352+00:00 app[worker.1]: sleep 3
2013-09-16T23:07:17.993877+00:00 app[worker.1]: sleep 4
2013-09-16T23:07:19.433624+00:00 heroku[worker.1]: Error R12 (Exit timeout) -> At least one process failed to exit within 10 seconds of SIGTERM
2013-09-16T23:07:19.433934+00:00 heroku[worker.1]: Stopping remaining processes with SIGKILL
2013-09-16T23:07:21.716823+00:00 heroku[worker.1]: Process exited with status 137

Notice the "R12" that happens. This is because the SIGTERM timeout takes longer than the 10 second allotted time by Heroku. "But we set it to 10!" That's a good point, but remember this is the time from SIGTERM to SIGKILL and in this time Resque and other code must run. Iwould recommend setting this to some value less than 10, like 5. This gives Resque enough time to clean up after itself as well.

Resque needs time to clean up when a SIGTERM happens and if it gets SIGKILL happens, it denies Resque the chance to do this. You'll notice stale workers in your Resque UI. If you deploy this code. And this is due to the fact that Resque can't clean up workers if the worker dies on a distributed system. If you try this code locally, you won't notice stale workers b/c prune_dead_workers works on a local box but not in a distributed environment like Heroku. Resque can't tell if a worker is alive or dead.

Now let's set the timeout to something more reasonable like 5.

$ heroku config:add RESQUE_TERM_TIMEOUT=5

Re-enquing the job and restarting:

$ heroku run "bundle exec ruby enqueue.rb"
$ heroku restart

Let's check our logs:

2013-09-16T23:53:54.348137+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`
2013-09-16T23:53:55.836592+00:00 heroku[worker.1]: State changed from starting to up
2013-09-16T23:53:55.897159+00:00 app[worker.1]: PID: 2
2013-09-16T23:54:36.031549+00:00 app[worker.1]: Performing work: 2013-09-16 23:54:36 +0000
2013-09-16T23:54:36.031549+00:00 app[worker.1]: sleep 1
2013-09-16T23:54:37.031835+00:00 app[worker.1]: sleep 2
2013-09-16T23:54:39.032235+00:00 app[worker.1]: sleep 3
2013-09-16T23:54:39.729718+00:00 heroku[worker.1]: Stopping all processes with SIGTERM
2013-09-16T23:54:40.527182+00:00 app[worker.1]: ensure started 2013-09-16 23:54:40 +0000
2013-09-16T23:54:40.527182+00:00 app[worker.1]: sleep 1
2013-09-16T23:54:41.527583+00:00 app[worker.1]: sleep 2
2013-09-16T23:54:43.528268+00:00 app[worker.1]: sleep 3
2013-09-16T23:54:44.404581+00:00 app[worker.1]: PID: 2
2013-09-16T23:54:41.985160+00:00 heroku[worker.1]: Starting process with command `env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work`

Notice the "R12" error is gone and it doesn't add more stale workers. :)

require 'resque'
require 'resque/server'
run Resque::Server.new
require 'resque'
Resque.logger.level = Logger::DEBUG
Resque.redis = ENV["REDISCLOUD_URL"]
#!/usr/bin/env ruby
require 'resque'
require './config_resque'
require './worker'
Resque.enqueue(Worker)
#!/usr/bin/env ruby
$stdout.sync = true
require 'resque'
require './config_resque'
require './worker'
pid = File.read("resque.pid").chomp.to_i
puts "Resque @ PID: #{pid}"
puts "Enqueue Worker job"
Resque.enqueue(Worker)
1.upto(3) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "TERM #{pid}"
Process.kill("TERM", pid)
1.upto(3) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "KILL #{pid}"
Process.kill("KILL", pid)
source "https://rubygems.org"
gem "resque"
gem "rake"
GEM
remote: https://rubygems.org/
specs:
mono_logger (1.1.0)
multi_json (1.8.0)
rack (1.5.2)
rack-protection (1.5.0)
rack
rake (10.1.0)
redis (3.0.4)
redis-namespace (1.3.1)
redis (~> 3.0.0)
resque (1.25.0)
mono_logger (~> 1.0)
multi_json (~> 1.0)
redis-namespace (~> 1.2)
sinatra (>= 0.9.2)
vegas (~> 0.1.2)
sinatra (1.4.3)
rack (~> 1.4)
rack-protection (~> 1.4)
tilt (~> 1.3, >= 1.3.4)
tilt (1.4.1)
vegas (0.1.11)
rack (>= 1.0.0)
PLATFORMS
ruby
DEPENDENCIES
rake
resque
web: bundle exec rackup -p $PORT
worker: env TERM_CHILD=1 QUEUES=* bundle exec rake resque:work
require 'resque'
require 'resque/tasks'
require './config_resque'
require './worker'
$stdout.sync = true
task :setup_resque do
File.open("resque.pid", 'w') do |file|
puts "PID: #{Process.pid}"
file.puts Process.pid
end
end
Rake::Task["resque:work"].enhance [:setup_resque]
class Worker
@queue = :worker
def self.perform
puts "Performing work: #{Time.now}"
1.upto(30) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "Done working: #{Time.now}"
rescue StandardError => e
puts "Rescued #{e.message}"
ensure
puts "ensure started #{Time.now}"
1.upto(60) do |i|
puts "sleep #{i}"
sleep(i)
end
puts "ensure finished #{Time.now}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment