Skip to content

Instantly share code, notes, and snippets.

@searls
Created January 31, 2024 16:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save searls/bf64e5b2224cdf701512bb17b40271c9 to your computer and use it in GitHub Desktop.
Save searls/bf64e5b2224cdf701512bb17b40271c9 to your computer and use it in GitHub Desktop.
Uncovered a race condition when multiple preprocessed variants are specified for a previewable attachment (e.g. PDF, videos) and multiple TransformJob jobs are enqueued after create

TIL this script helped me figure out that

  1. Given a previewable attachment (video, pdf) has 2 preprocessed variants
  2. After being attached, these are enqueued [ActiveStorage::AnalyzeJob, ActiveStorage::TransformJob, ActiveStorage::TransformJob]
  3. All 3 jobs will (I assume?) separately download the video from storage to a tmp file
  4. Running asynchronously, a race condition results: both transform jobs will see no blob.preview_image exists yet and will BOTH create one
  5. Since a blob only has one preview_image attached, one preview image (and any variants associated with it) will be orphaned
  6. As a result when the attachment is purged, those orphaned blobs and variant record will fail to be purged

Not awesome things that happen in this case:

  1. If a video is large, downloading it 3 times is wasteful
  2. Transforming each variant in a separate job creates a race condition as both jobs will see no preview_image exists yet and will both create one
  3. Orphaned blobs and variant records are left behind in the database and storage

Ideas for improvement:

  1. Test: We could write a test for this by performing the jobs async, either by adding an async: option (and thread executor) to the TestQueueAdapter (https://github.com/rails/rails/blob/main/activejob/lib/active_job/test_helper.rb#L606 )
  2. Fix: We could batch AnalyzeJob, and 1…n TransformJob jobs into a single MultiJob that downloads the blob once and processes all variants synchronously
  3. Workaround: if a video attachment needs multiple variants, user could set preprocessed: false and then write a custom job that manually processes them all in sequence, and enqueue that after create
#!/usr/bin/env ruby
require_relative "../config/environment"
puts "Creating a post and uploading a video"
now = Time.zone.now
PRE_EXISTING_IDS = {
ActiveStorage::VariantRecord => ActiveStorage::VariantRecord.all.map(&:id)
}
QUEUE_ADAPTER = Rails.application.config.active_job.queue_adapter = if ENV["QUEUE_ADAPTER"]
puts "Setting queue adapter to #{ENV["QUEUE_ADAPTER"]}"
ENV["QUEUE_ADAPTER"].to_sym
else
puts "Using test queue adapter"
:test
end
def hr
"=" * 80
end
def record_counts_after(time)
[
Post,
Visual,
ActiveStorage::Blob,
ActiveStorage::Attachment,
ActiveStorage::VariantRecord
].map { |klass|
rel = if PRE_EXISTING_IDS.key?(klass)
klass.where.not(id: PRE_EXISTING_IDS[klass]).order("id ASC")
else
klass.where("created_at > ?", time).order("created_at ASC")
end
" - #{rel.count} #{klass.name.pluralize(rel.count)} [#{rel.map(&:id).join(", ")}]"
}.join("\n")
end
def print_summary(title, time)
puts <<~MSG
#{hr}
Record counts #{title} (created after #{time}):
#{record_counts_after(time)}
MSG
end
def blob_for(file_path)
file = Pathname.new(file_path)
ActiveStorage::Blob.create_before_direct_upload!(
key: nil,
filename: file_path,
byte_size: file.size,
checksum: OpenSSL::Digest::MD5.file(file).base64digest,
content_type: Marcel::MimeType.for(file),
record: nil
).tap do |blob|
ActiveStorage::Blob.service.upload(blob.key, file.open)
end
end
def run_test_enqueued_jobs!
puts ActiveJob::Base.queue_adapter.enqueued_jobs.each { |job|
args = job[:args].map { |arg|
if arg.is_a?(Hash) && arg.key?("_aj_globalid")
GlobalID::Locator.locate(arg["_aj_globalid"])
elsif arg.is_a?(Hash) && arg.key?("_aj_symbol_keys")
arg.without("_aj_symbol_keys")
else
arg
end
}
job[:job].new.perform(*args)
}
end
def clear_test_enqueued_jobs!
ActiveJob::Base.queue_adapter.enqueued_jobs.clear
end
print_summary("before doing anything", now)
user = User.first
file = ENV["FILE"] || Rails.root.join("test/fixtures/files/video.mp4")
post = Post.create!(
user: user,
published_at: now,
slug: "my-first-post",
title: "My first post",
caption: "This is my first post",
visuals: [
Visual.new(file: blob_for(file))
]
)
print_summary("after attaching '#{file}'", now)
if QUEUE_ADAPTER == :test
run_test_enqueued_jobs!
clear_test_enqueued_jobs!
elsif QUEUE_ADAPTER != :inline
sleep 5
print_summary("after letting preprocess jobs run", now)
end
if ENV["INSPECT"]
puts "Hitting debugger to inspect state before destroying post"
binding.irb
end
post.destroy!
print_summary("after destroying post", now)
if QUEUE_ADAPTER == :test
run_test_enqueued_jobs!
clear_test_enqueued_jobs!
elsif QUEUE_ADAPTER != :inline
sleep 5
end
print_summary("after letting purge jobs run", now)
if ENV["INSPECT"]
blobs = ActiveStorage::Blob.where("created_at > ?", now)
attachments = ActiveStorage::Attachment.where("created_at > ?", now)
variant_records = ActiveStorage::VariantRecord.where.not(id: PRE_EXISTING_IDS[ActiveStorage::VariantRecord])
binding.irb
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment