rexmortus/background-jobs-and-sidekiq.md

## background-jobs-and-sidekiq.md

      
    Raw
  

              background-jobs-and-sidekiq.md
            
          
    Background Jobs & Sidekiq

In a web application there can be many time-consuming processes which, if we execute them synchronously within a controller, will block the user from receiving a response (which they’ll experience as painfully slow load times) and in today’s world, this is completely and utterly unacceptable!
Here are some examples:


Preparing a download, say zip file, where file compression happens on-the-fly (e.g. Google Docs)


POSTing a request to an external API (e.g. sending a message to Slack)


Kicking off a continuous-integration build (e.g. Circle CI, etc.)


Sending an email (e.g. action_mailer or Mailchimp)


These are all actions which should be “kicked off” by a controller, but we simply cannot force the user to wait around for their completion - we want to render a page for them to see while they wait for the main work to be completed.
In this lesson, we’re going to create an application that creates an enqueues background jobs and provides a dashboard interface for inspecting them.
0.0 Prerequisites

0.1 Check which version of Ruby is installed:
ruby -v
ruby 2.6.3p62 (2019-04-16 revision 67580) 

Better yet, use which because it gives us the full path of our Ruby interpreter (and indicates whether we’re using rvm):
which ruby
~/.rvm/rubies/ruby-2.6.3/bin/ruby

0.2 What version of Rails is installed?
rails -v
Rails 5.2.3

Or:
which rails
~/.rvm/gems/ruby-2.6.3/bin/rails

0.3 Is Postgres running?
If you have the Postgres OS X app installed, simply check the menu-bar application to check.
There are multiple ways to check the status of postgres from the command line, including pg-ctl but we won’t cover that now.
0.4 Is yarn installed?
Because we’re using the webpacker gem, we’ll need yarn (the Javascript dependency manager).
brew install yarn

Then install the Javascript dependencies:
yarn install

Due to reasons, you may also need to manually install some of the Javascript dependencies:
yarn add bootstrap jquery popper.js

1. Create application

Let’s start by creating using rails new to create a new Rails project:
rails new \
  --database postgresql \
  --webpack \
  -m https://raw.githubusercontent.com/lewagon/rails-templates/master/devise.rb \
  background-jobs-demo

Notice that we’ll use the devise and webpacker gems, which suggests our application will have the concept of User, and render a front-end. We’ll clone a template using -m as well, just to speed things up a bit.
2. Creating the models

Now, remember that we’re going to create a dashboard showing all queued background jobs… this is definitely not  something we want to expose to any person off the street! Only admins should be able to view this list, so let’s begin by updating the default devise User model to add an admin field, thereby  providing a mechanism for restricting access to just admin users.
We’ll use rails generate to create a migration:
rails generate migration AddAdminToUsers

Now, open the new migration file and add:
def change
  add_column :users, :admin, :boolean, null: false, default: false
end
With this migration we’re adding a boolean field called admin, which cannot be null, and whose default value is false. Basically, when we create a new User they will not be an admin - which makes sense.
Now let’s run the migration:
rake db:migrate
== 20190530000944 AddAdminToUsers: migrating ==================================
-- add_column(:users, :admin, :boolean, {:null=>false, :default=>false})
   -> 0.0049s
== 20190530000944 AddAdminToUsers: migrated (0.0050s) =========================

Now, just for fun let’s open rails console and create a new User:
rails console
Running via Spring preloader in process 49386
Loading development environment (Rails 5.2.3)
[1] pry(main)> User.create! :email => 'admin@gmail.com', :password => 'password', :admin => true
   (0.2ms)  BEGIN
  User Exists (1.9ms)  SELECT  1 AS one FROM "users" WHERE "users"."email" = $1 LIMIT $2  [["email", "admin@gmail.com"], ["LIMIT", 1]]
  User Create (0.9ms)  INSERT INTO "users" ("email", "encrypted_password", "created_at", "updated_at", "admin") VALUES ($1, $2, $3, $4, $5) RETURNING "id"  [["email", "admin@gmail.com"], ["encrypted_password", "$2a$11$zNeVCENAHLxCNC7SLYkhxuBdVsA4GarjNflZgXrrUmFO185BgNsmW"], ["created_at", "2019-05-30 00:11:25.211103"], ["updated_at", "2019-05-30 00:11:25.211103"], ["admin", true]]
   (0.5ms)  COMMIT
=> #<User id: 1, email: "admin@gmail.com", created_at: "2019-05-30 00:11:25", updated_at: "2019-05-30 00:11:25", admin: true>
3. Creating a Job

The next step is to create our background job. We’ll use rails generate:
rails generate job fake
Running via Spring preloader in process 49679
      invoke  test_unit
      create    test/jobs/fake_job_test.rb
      create  app/jobs/fake_job.rb
Open FakeJob at app/jobs/fake_job.rb and add:
class FakeJob < ApplicationJob
  queue_as :default

  def perform
    puts "I'm starting the fake job"
    sleep 3
    puts "OK I'm done now"
  end
end
This job is a simple illustration of the kind of activities that are appropriate for background jobs. Here, we are simply printing "I'm starting the fake job", sleeping for 3 seconds, and then printing "Ok I'm done now", with sleep 3 standing in for any kind of action that takes a long time to complete (e.g. an HTTP request).
Let’s test our new job in the rails console:
rails console
[1] pry(main)> FakeJob.perform_now
Performing FakeJob (Job ID: c15d93d2-be23-4da4-80c9-ba31c2d7c57b) from Async(default)
I'm starting the fake job
OK I'm done now
Performed FakeJob (Job ID: c15d93d2-be23-4da4-80c9-ba31c2d7c57b) from Async(default) in 3000.4ms
=> nil

A few things to note:


Running our job in rails console means it executes synchronously, so… basically the same as if we executed it inside a controller. The end-goal is to execute it asynchronously so it gets out of the user’s way.


If you watch the execution, you’ll see 3 seconds elapse between "I'm starting the fake job and "Ok I'm done now.


The job returns nil.


And though returning nil is rather boring and useless, a job can do anything! Remember that.
4. Running jobs in the background

We have an admin User and a Job, but we need some infrastructure to execute those jobs in the background.
For this lesson we’ll use Sidekiq but you could also use ActiveJob or QueueAdapters.
Sidekiq is a job queue built on Redis, an in-memory key-value store. Basically,  Redis is like a simple database with no permanent storage. Its main advantage is speed - a perfect fit for implementing a job queue.
We’ll use homebrew to install Redis:
brew update
brew install redis

This should work fine… unless you’ve setup homebrew incorrectly or are using homebrew from a user account that isn’t the one you installed it with, in which case you may need to change some file permissions. A note: DON’T use sudo to overcome these errors… for really important reasons, which I won’t cover here. Instead use chown to give the current user access to any affected directories (i.e. /usr/local/Homebrew).
Now, let’s run redis-server:
brew services start redis
==> Tapping homebrew/services
Cloning into '/usr/local/Homebrew/Library/Taps/homebrew/homebrew-services'...
remote: Enumerating objects: 12, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 12 (delta 0), reused 7 (delta 0), pack-reused 0
Unpacking objects: 100% (12/12), done.
Tapped 1 command (41 files, 58.7KB).
==> Successfully started `redis` (label: homebrew.mxcl.redis)

5. Configuring our application to use Redis

Ok, now that redis-server is up-and-running, we have to configure our Rails application to connect to it, and plug into Sidekiq specifically.
So, let’s install Sidekiq. Open your Gemfile and add:
gem 'sidekiq'
gem 'sidekiq-failures', '~> 1.0'
Then run:
bundle install
Fetching gem metadata from https://rubygems.org/............
Resolving dependencies...
...
Fetching rack-protection 2.0.5
Installing rack-protection 2.0.5
...
Fetching sidekiq 5.2.7
Installing sidekiq 5.2.7
Fetching sidekiq-failures 1.0.0
Installing sidekiq-failures 1.0.0
...
Bundle complete! 22 Gemfile dependencies, 81 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.

Awesome. Now, we have to create a binstub that wraps the sidekiq gem. A binstub is:

… wrapper scripts around executables (sometimes referred to as "binaries", although they don't have to be compiled) whose purpose is to prepare the environment before dispatching the call to the original executable.

sidekiq is an “executable”, in other words: an entirely separate application to our Rails web application. The binstub we’ve just created configures the execution environment of sidekiq so that it plugs into our Rails app. Got it?
By the way, sidekiq is the component that connects to redis-server, not our Rails app.
Next, we have to configure our application to use sidekiq as its default job queue. Open config/application.rb and add:
config.active_job.queue_adapter = :sidekiq
Though this next step is optional, it’s pretty neat. The sidekiq rubygem can serve a simple web dashboard showing us information about its queue. To configure this, open config/routes.rb and add:
require "sidekiq/web"
# [...]
authenticate :user, lambda { |u| u.admin } do
  mount Sidekiq::Web => '/sidekiq'
end
If you run rake routes you’ll see we’ve got the new route:
sidekiq_web        /sidekiq                      Sidekiq::Web
Strange that it doesn’t specify an HTTP verb… but oh well?
Finally, lets configure Sidekiq within our Rails application. Create config/sidekiq.yml and add:
:concurrency: 3
:timeout: 60
:verbose: true
:queues:
  - default
  - mailers
A few notes on this configuration:


With :concurrency: 3 we’re saying that sidekiq is allowed to process three background jobs simultaneously


With :timeout: 60 we’re saying that sidekiq should terminate any jobs that take longer than 60 seconds


With :verbose: true we’re saying that sidekiq should print detailed error messages (this will help us debug)


And finally, we’re creating two separate queues: default and mailers.


6. Running sidekiq and queueing background jobs

Now that we’ve configured our application to use Sidekiq, let’s start it up.
In your terminal, open a new tab and execute:
sidekiq
2019-05-30T01:11:02.493Z 59109 TID-ovkeg76yd INFO: ==================================================================
2019-05-30T01:11:02.493Z 59109 TID-ovkeg76yd INFO:   Please point sidekiq to a Rails 4/5 application or a Ruby file  
2019-05-30T01:11:02.493Z 59109 TID-ovkeg76yd INFO:   to load your worker classes with -r [DIR|FILE].
2019-05-30T01:11:02.493Z 59109 TID-ovkeg76yd INFO: ==================================================================
2019-05-30T01:11:02.493Z 59109 TID-ovkeg76yd INFO: sidekiq [options]

What’s this?! Turns out that if you run sidekiq outside of the Rails application root, it doesn’t work properly.
Remember that binstub we created earlier? Well, if we run sidekiq within our application root, it will run sidekiq with all the configuration we provided:
sidekiq
...
DEBUG: {:queues=>["default", "mailers"], :labels=>[], :concurrency=>3, :require=>".", :environment=>nil, :timeout=>60, :poll_interval_average=>nil, :average_scheduled_poll_interval=>5, :error_handlers=>[#<Sidekiq::ExceptionHandler::Logger:0x00007f917216b5d8>], :death_handlers=>[], :lifecycle_events=>{:startup=>[], :quiet=>[], :shutdown=>[], :heartbeat=>[]}, :dead_max_jobs=>10000, :dead_timeout_in_seconds=>15552000, :reloader=>#<Sidekiq::Rails::Reloader @app=BackgroundJobsDemo::Application>, :verbose=>true, :config_file=>"./config/sidekiq.yml", :strict=>true, :tag=>"background-jobs-demo", :identity=>"Charitys-MacBook-Pro.local:59363:6495772df58b"}

Shaboom shaboom.
Now, we should be able to enqueue jobs from anywhere inside our Rails application! Let’s start with creating one with the rails console:
[1] pry(main)> FakeJob.perform_later
Enqueued FakeJob (Job ID: 842d7ede-4075-4cab-b9bc-fecf953e810b) to Sidekiq(default)
=> #<FakeJob:0x00007f91e210f470
 @arguments=[],
 @executions=0,
 @job_id="842d7ede-4075-4cab-b9bc-fecf953e810b",
 @priority=nil,
 @provider_job_id="5420faaaeffc1e400e87bbac",
 @queue_name="default">

Meanwhile, if we check on sidekiq we’ll see that our background job has been enqueued and executed:
2019-05-30T01:16:05.343Z 59363 TID-ouxvy6kxb FakeJob JID-5420faaaeffc1e400e87bbac INFO: start
I'm starting the fake job
OK I'm done now
2019-05-30T01:16:08.389Z 59363 TID-ouxvy6kxb FakeJob JID-5420faaaeffc1e400e87bbac INFO: done: 3.046 sec

7. Example 1: (not actually) using Clearbit

Clearbit is a service that gathers data about people from the public web. The Enrichment API in particular takes an email address and returns a detailed profile of all the associated public information.
A natural place to make such a request to Clearbit is whenever a new User signs up to our app.
There are two main steps:

Create the Job which queries the Clearbit API
Enqueue the Job from the User model

Using rails generate, let’s create the Job:
rails generate job UpdateUser
Running via Spring preloader in process 60709
      invoke  test_unit
      create    test/jobs/update_user_job_test.rb
      create  app/jobs/update_user_job.rb

Now, let’s open app/jobs/update_user_job.rb and add:
class UpdateUserJob < ApplicationJob
  queue_as :default

  def perform(user_id)
    user = User.find(user_id)
    puts "Calling Clearbit API for #{user.email}..."
    # TODO: perform a time consuming task like Clearbit's Enrinchment API.
    sleep 2
    puts "Done! Enriched #{user.email} with Clearbit"
  end
end
We’ve defined #perform to accept one parameter: user_id. The method will use this id to retrieve a user from the database (we created one earlier), fake doing an API call to Clearbit, and then false state that we’ve “enriched” the user with Clearbit information.
Of course, we’ve done no such thing, but you absolutely could and that’s the point.
Now, open app/models/user.rb and add:
class User < ApplicationRecord
  # [...]

  after_save :async_update # Run on create & update

  private

  def async_update
    UpdateUserJob.perform_later(self.id)
  end
end
Now, whenever we create or update a User model, #async_update is invoked  which enqueues our new UpdateUser job. Let’s test it out in the rails console:
[1] pry(main)> user = User.find(1)
  User Load (0.9ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
=> #<User id: 1, email: "admin@gmail.com", created_at: "2019-05-30 00:11:25", updated_at: "2019-05-30 00:11:25", admin: true>
[2] pry(main)> user.save
   (0.2ms)  BEGIN
Enqueued UpdateUserJob (Job ID: a2f04afb-df96-4419-af3a-0b82b0aa589c) to Sidekiq(default) with arguments: 1
   (0.5ms)  COMMIT
=> true

And if we check sidekiq we’ll see the job has been executed:
INFO: start
Calling Clearbit API for admin@gmail.com...
Done! Enriched admin@gmail.com with Clearbit
2019-05-30T01:38:21.436Z 59363 TID-ouxvy6l3n UpdateUserJob JID-001291260c2ce3ddfec6e812 INFO: done: 2.075 sec

Awesome! Amazing. Wonderful…
8. Example 2: Doing the same thing… in a controller

Deciding where to enqueue your background tasks is an important design choice.
For example, in this lesson we enqueued UpdateUserJob whenever a User model was saved or created, using the after_save invocation on the model itself. To me, this makes sense because there are theoretically a number of places within our application when User objects are saved - for example, when a new user signs up (and a User instance is created for them), or when an existing User instance is updated.
In other words, this particular job is tied to the act of saving a User, which is very much the concern of our model layer - not our controller layer!
That said, if we wanted to enqueue this job in a controller for some reason (???), we could do so. We won’t actually go through the steps of creating this controller and view, but this is how it would be done:
# app/controllers/profiles_controller.rb
class ProfilesController < ApplicationController
  def update
    if current_user.update(user_params)
      UpdateUserJob.perform_later(current_user.id)  # <- The job is queued
      flash[:notice] = "Your profile has been updated"
      redirect_to root_path
    else
      render :edit
    end
  end

  private

  def user_params
    # Some strong params of your choice
  end
end
Of course, if you implemented this exactly as-is, the UpdateUserJob task would be enqueued twice - once in the controller, and once in the User model when after_save is invoked. This makes no sense, so don’t do it.
Some better examples for a controller would be: compressing a large file on-the-fly, making an external API request, and so on.
9. Example 3: Enqueueing tasks from a rake task

ruby make or rake is an awesome tool, and writing rake tasks is, to me at least, rather enjoyable.
We can use rake for all kinds of tasks, but one of the more common themes is updating data en-masse. For example, let’s say you’ve just done a database migration where you’ve added a new field to the User model that will store Clearbit information - rather than waiting for each User to update itself (and thus enqueueing the UpdateUserJob task) we can pro-actively update our data with a rake task instead!
Let’s generate our rake task using rails generate:
rails generate task user update_all
Running via Spring preloader in process 64234
      create  lib/tasks/user.rake

Open lib/tasks/user.rake and add:
namespace :user do
  desc "Enriching all users with Clearbit (async)"
  task :update_all => :environment do
    users = User.all
    puts "Enqueuing update of #{users.size} users..."
    users.each do |user|
      UpdateUserJob.perform_later(user.id)
    end
    # rake task will return when all jobs are _enqueued_ (not done).
  end
end
This task will load all User instances and enqueue an UpdateUserJob for each of them. Now let’s run the task:
rake user:update_all
Enqueuing update of 1 users...

In sidekiq we should see:
2019-05-30T02:04:00.982Z 59363 TID-ouxvy6kof UpdateUserJob JID-e93889ccb32bca6f121df643 INFO: start
Calling Clearbit API for admin@gmail.com...
Done! Enriched admin@gmail.com with Clearbit
2019-05-30T02:04:03.010Z 59363 TID-ouxvy6kof UpdateUserJob JID-e93889ccb32bca6f121df643 INFO: done: 2.028 sec

There are a few benefits to having our rake tasks use Jobs in this way, the biggest one being concurrency.
If instead we’d made the calls to Clearbit’s API synchronously within the task, overall it would take much longer since we’d only be completing one request at a time. Using jobs and the fact of sidekiq’s concurrency feature, we can complete up to 3 API calls simultaneously! Of course, more powerful servers can handle even larger queues.
In the previous rake task, we update all User instances, but what if we want to update just one instance? We’d need to provide our task with the id of that user… fortunately rake tasks can be executed with parameters. We want something like this:
rake user:update[1]

Open lib/tasks/user.rake and add:
desc "Enriching a given user with Clearbit (sync)"
task :update, [:user_id] => :environment do |t, args|
  user = User.find(args[:user_id])
  puts "Enriching #{user.email}..."
  UpdateUserJob.perform_now(user.id)
  # rake task will return when job is _done_
end
Notice the difference here?
task :update, [:user_id] => :environment do |t, args|
We’ve configured the user.update task  to accept one parameter, :user_id, and so in the definition of this task we can access it with args[:user_id], like so:
user = User.find(args[:user_id])
Also, notice that when we enqueue UpdateUserJob we are doing so synchronously (via perform_now) since it’s just one API request that shouldn’t take more than a few seconds to complete. In other words, doing this job asynchronously doesn’t really confer much of benefit, so we won’t bother giving it to sidekiq at all.
Now let’s run the task:
rake user:update[1]
Enriching admin@gmail.com...
Calling Clearbit API for admin@gmail.com...
Done! Enriched admin@gmail.com with Clearbit

Notice that the stdout output of our job appears in this terminal window, not sidekiq. If this were enqueued as an asynchronous job, the output would appear there, instead.
10. Example 4: Sending emails

Actually, sending emails can be quite time-intensive, especially if you have to compose a custom email with information from your database (e.g. “Hello <username>, thanks for doing <some thing>”), not to mention if you’re using an external mailing service like Mailchimp, this requires making HTTP requests to its API, which also takes time.
So, better to send emails in the background, and not block the user from interacting your application.
When we configured sidekiq, remember that we created two distinct queues:
# ...
:queues:
  - default
  - mailers
devise and action_mailer make this easy… we don’t even have to create a custom job! All we need to do is invoke UserMailer#welcome in our User model… so let’s open app/models/user.rb and add:
# ...
after_create :send_welcome_email
# ...
private

def send_welcome_email
  UserMailer.welcome(self.id).deliver_later
end
Of course, UserMailer hasn’t yet been created. Lets use rails generate to do this:
rails generate mailer UserMailer
Running via Spring preloader in process 79934
      create  app/mailers/user_mailer.rb
      invoke  erb
      create    app/views/user_mailer
      invoke  test_unit
      create    test/mailers/user_mailer_test.rb
      create    test/mailers/previews/user_mailer_preview.rb

action_mailer allows you to send emails from your application using mailer classes and views. Mailers work very similarly to controllers. They inherit from ActionMailer Base and live in app/mailers, and they have associated views that appear in app/views.
Now, let’s open app/mailers/user_mailer.rb and add:
def welcome
  @user = params[:user]
  @url  = 'http://example.com/login'
  mail(to: @user.email, subject: 'Welcome to My Awesome Site')
end
This will enqueue our UserMailer#welcome job and send off an email.
11. Delaying jobs

By default, sidekiq checks its queues for jobs every 5 seconds, executes them as soon it’s able.
However, we may want to delay the execution of a job. The API for delaying jobs looks like this:
FakeJob.set(wait: 1.minute).perform_later

FakeJob.set(wait_until: Date.tomorrow.noon).perform_later
12. Running sidekiq in production

Ok, now that we’ve succeeded running sidekiq in our development environment, how do we get it working in production?
Briefly, let’s review all the moving parts:

redis-server: our in-memory key-value store
sidekiq: running in its own process and connected to redis-server
rails server: configured to connect to sidekiq

Let’s take this a step at a time using Heroku as the production environment.
Using the heroku CLI application, we can set up redis-server with the rediscloud add-on:
heroku addons:create rediscloud
Next, open the configuration file located at config/initializers/redis.rb and add:
$redis = Redis.new

url = ENV["REDISCLOUD_URL"]

if url
  Sidekiq.configure_server do |config|
    config.redis = { url: url }
  end

  Sidekiq.configure_client do |config|
    config.redis = { url: url }
  end
  $redis = Redis.new(:url => url)
end
This will configure redis to make a connection to the production rediscloud instance we just created .
Next, we have to tell heroku to add an additional process: the sidekiq worker. Open your Procfile and add:
web: bundle exec puma -C config/puma.rb
worker: bundle exec sidekiq -C config/sidekiq.yml
Notice that we’re telling heroku to create two separate processes, one for the rails app (i.e. web) and one for the sidekiq worker (i.e. worker), and plugging our new configuration files into the processes.
Now, commit the changes and push them to heroku.
Finally, scale up the sidekiq worker like thus:
heroku ps:scale worker=1
heroku ps