Skip to content

Instantly share code, notes, and snippets.

@casperisfine
Last active June 27, 2022 23:10
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save casperisfine/0ccd24dc209665c46e83bcc2920dd7dc to your computer and use it in GitHub Desktop.
Save casperisfine/0ccd24dc209665c46e83bcc2920dd7dc to your computer and use it in GitHub Desktop.
# frozen_string_literal: true
begin
require "bundler/inline"
rescue LoadError => e
$stderr.puts "Bundler version 1.10 or later is required. Please update your Bundler"
raise e
end
gemfile(true) do
source "https://rubygems.org"
gem "activerecord", github: 'Shopify/rails', branch: 'ar-async-query'
gem 'benchmark-ips'
gem 'mysql2'
gem 'byebug'
end
require 'active_record'
require 'logger'
require 'benchmark/ips'
# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: 'mysql2', database: 'ar_benchmark', username: 'root')
ActiveRecord::Base.logger = Logger.new(nil)
ActiveRecord::Schema.define do
create_table :posts, if_not_exists: true do |t|
t.string :title
end
create_table :categories, if_not_exists: true do |t|
t.string :title
end
end
class Post < ActiveRecord::Base
scope :complex_query, -> { select('*, sleep(0.001)') }
end
class Category < ActiveRecord::Base
scope :complex_query, -> { select('*, sleep(0.001)') }
end
puts "== Seed DB =="
Post.delete_all
1_000.times { |i| Post.create(title: "Post ##{i}") }
Category.delete_all
1_000.times { |i| Category.create(title: "Post ##{i}") }
def sync_action(count)
@posts = Post.complex_query.limit(count)
@categories = Category.complex_query.limit(count)
raise 'fail' unless @posts.each.count == count
raise 'fail' unless @categories.each.count == count
end
def async_action(count)
@posts = Post.complex_query.limit(count).defer
@categories = Category.complex_query.limit(count).defer
raise 'fail' unless @posts.each.count == count
raise 'fail' unless @categories.each.count == count
end
ActiveRecord::Base.logger = Logger.new(STDOUT)
puts "== Sanity Checks =="
sync_action(50)
async_action(50)
ActiveRecord::Base.logger = Logger.new(nil)
puts "== Benchmarking =="
Benchmark.ips do |x|
x.report("sync(10)") { sync_action(10) }
x.report("async(10)") { async_action(10) }
x.report("sync(50)") { sync_action(50) }
x.report("async(50)") { async_action(50) }
end

This is an idea we've been entertaining with Rafael for a while.

Use case

A pattern that often emerge in large app is that the controller need to perform several DB queries, and then use that to render the view. Sometimes the queries are interdependent, so they have to happen one after the other, but sometimes they are totally independent and could be parallelized.

e.g.

class BlogController
  def index
    @categories = Category.all # to render the sidebar or something
    @posts = Post.order(published_at: :desc)
  end
end

Naive implementation

In this example both Post and Category could be queried in parallel. Of course on paper you could just do query from threads:

class BlogController
  def index
    categories_future = Thread.new { Category.all.to_s }
    @posts = Post.order(published_at: :desc).to_a
    @categories = categories_future.value
  end
end

But then you are executing a lot of user code in a background thread which breaks lots of expectations. You what's stored in Thread.current, so things like Marginalia breaks, CurrentAttributes are lost, pretty sure ActiveSupport::Instrumentation also breaks. So long story short, this won't work on most realistic scenarios, and for it to work users would need to be extremely careful to be really thread safe. Not just thread safe across request like today, but thread safe inside each request.

This PR

So instead I implemented a quick and dirty proof of concept that only perform the query in a thread pool, the rest of the work (instatiating models, etc) is still done on the main thread. That takes care of the vast majority of the per thread/fiber context problems. Some things AS::Instrumentaion would still need to be ironed out.

The script above showcase how it works, you have to explicitly call .defer on a Relation to schedule the query in the background. Then wehn you do use the relation, either the query is completed and it just uses the results, or it is being executed and it waits for it to complete, or the thread pool was busy and it simply execute it in the foreground.

It's far from perfect, making it production quality would require a bunch more work, but at this stage it seems totally doable.

Fetching https://github.com/Shopify/rails.git
Fetching gem metadata from https://rubygems.org/.......
Fetching gem metadata from https://rubygems.org/............
Fetching gem metadata from https://rubygems.org/............
Resolving dependencies...
Using concurrent-ruby 1.1.6
Using i18n 1.8.5
Using minitest 5.14.1
Using tzinfo 2.0.2
Using zeitwerk 2.4.0
Using activesupport 6.1.0.alpha from source at `/Users/byroot/src/github.com/Shopify/rails`
Using activemodel 6.1.0.alpha from source at `/Users/byroot/src/github.com/Shopify/rails`
Using activerecord 6.1.0.alpha from source at `/Users/byroot/src/github.com/Shopify/rails`
Using benchmark-ips 2.8.2
Using bundler 2.1.4
Using byebug 11.1.3
Using mysql2 0.5.3
-- create_table(:posts, {:if_not_exists=>true})
-> 0.0035s
-- create_table(:categories, {:if_not_exists=>true})
-> 0.0002s
== Seed DB ==
== Sanity Checks ==
D, [2020-08-06T15:30:47.091875 #13675] DEBUG -- : Post Load (146.2ms) SELECT *, sleep(0.001) FROM `posts` LIMIT 50
D, [2020-08-06T15:30:47.185073 #13675] DEBUG -- : Category Load (92.6ms) SELECT *, sleep(0.001) FROM `categories` LIMIT 50
I, [2020-08-06T15:30:47.186539 #13675] INFO -- : Executing in foreground 2527104.539794
I, [2020-08-06T15:30:47.186613 #13675] INFO -- : Executing in background 2527104.539867
D, [2020-08-06T15:30:47.283428 #13675] DEBUG -- : Category Load (96.0ms) SELECT *, sleep(0.001) FROM `categories` LIMIT 50
D, [2020-08-06T15:30:47.293737 #13675] DEBUG -- : Post Load (107.0ms) SELECT *, sleep(0.001) FROM `posts` LIMIT 50
== Benchmarking ==
Warming up --------------------------------------
sync(10) 2.000 i/100ms
async(10) 4.000 i/100ms
sync(50) 1.000 i/100ms
async(50) 1.000 i/100ms
Calculating -------------------------------------
sync(10) 22.053 (±18.1%) i/s - 108.000 in 5.034257s
async(10) 42.279 (±14.2%) i/s - 208.000 in 5.017963s
sync(50) 4.552 (±22.0%) i/s - 23.000 in 5.145624s
async(50) 8.834 (±11.3%) i/s - 44.000 in 5.100323s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment