Skip to content

Instantly share code, notes, and snippets.

@abachman
Last active December 15, 2023 22:12
Show Gist options
  • Save abachman/1c895656b094cfa92785f7703043837a to your computer and use it in GitHub Desktop.
Save abachman/1c895656b094cfa92785f7703043837a to your computer and use it in GitHub Desktop.
Google Cloud Spanner Emulator transaction hangup
# frozen_string_literal: true
##
# Force a stale-open transaction on the Spanner Emulator by running this script
# and then killing it with Ctrl-C.
#
# Rerunning the script will result in a hung process followed by the "one
# transaction at a time" error.
##
require_relative "emulator_util"
$stdout.sync = true
EmulatorUtil.setup!
client = EmulatorUtil.client
loop do
client.transaction(deadline: 3) do |tx|
puts "starting transaction #{tx.transaction_id}"
tx.execute "INSERT INTO Customers (Id) VALUES ('#{SecureRandom.hex(6)}')"
sleep 0.5
tx.execute "INSERT INTO Customers (Id) VALUES ('#{SecureRandom.hex(6)}')"
end
rescue Google::Cloud::AbortedError => e
puts "#{e.class} #{e.message}"
exit 1
rescue => e
puts "#{e.class} #{e.message}"
end
require_relative "emulator_util"
# stick with ENV defaults
EmulatorUtil.logger = Logger.new("/dev/null")
EmulatorUtil.reset_all_emulator_transactions!
require_relative "emulator_util"
# stick with ENV defaults
EmulatorUtil.logger = Logger.new("/dev/null")
EmulatorUtil.release_all_emulator_sessions!
require "google/cloud/spanner"
project = Google::Cloud::Spanner.new(
project_id: ENV["SPANNER_PROJECT_ID"],
emulator_host: ENV["SPANNER_EMULATOR_HOST"]
)
client = project.client(ENV["SPANNER_INSTANCE_ID"], ENV["SPANNER_DATABASE_ID"])
begin
client.transaction(deadline: 5) do |tx|
tx.execute "INSERT INTO Customers (Id) VALUES ('#{SecureRandom.hex(6)}')"
end
rescue Google::Cloud::AbortedError => e
exit 1
end

Illustrating the "hung transactions" problem in the Google Cloud Spanner emulator.

If a process opens a transaction on a database in the emulator and crashes without committing or rolling back, the database will not permit any future transactions.

One workaround (linked below) is to force all hung open transactions to close by listing every session on the database, creating an empty transaction on it, and then immediately rolling the transaction back.

BUT, if sessions are released instead of going through the empty transaction workaround, either manually or by any other background process (something in the emulator?), then they cannot be listed. This means new transactions cannot be opened on them and any transactions which were left open on them are still open. This effectively kills the database, requiring a restart of the emulator.

I've been using the workaround in a project, but still occasionally see open transactions with no sessions after returning to work the day after spending time running tests in the emulator.

usage

setup:

# setup
$ gem install google-cloud-spanner
$ docker run -p 9010:9010 -p 9020:9020 gcr.io/cloud-spanner-emulator/emulator:latest

run:

$ sh run.sh
1..8
ok 1 - emulator is stuck
ok 2 - emulator sessions reset
ok 3 - transaction failed
ok 4 - emulator is stuck
ok 5 - emulator sessions released
not ok 6 - transaction failed
ok 7 - emulator sessions reset
not ok 8 - transaction failed

The test output illustrates the problem scenario:

  1. a transaction is opened on the database and the process is killed until one is left hanging open
  2. emulator sessions are reset by opening and rolling back an empty transaction
  3. another transaction tried on the same database succeeds
  4. same as 1, flail until the database is locked up
  5. release emulator sessions, removing them from the emulator
  6. transaction attempt fails with the Google::Cloud::AbortedError ... The emulator only supports one transaction at a time error
  7. try resetting sessions again
  8. transaction attempt still fails with the Google::Cloud::AbortedError error

Ideally, every transaction attempt should succeed.

links

# What if instead of opening and rolling back transactions, we release every session?
require "rubygems"
require "google/cloud/spanner"
# patch the service to add a missing method
module Google
module Cloud
module Spanner
class Service
# add a missing list_sessions method
# @param database [String] in the form of a full Spanner identifier like
# "project/.../instance/.../database/..."
def list_sessions(database:, call_options: nil, token: nil, max: nil)
opts = default_options call_options: call_options
request = {
database: database,
page_size: max,
page_token: token
}
paged_enum = service.list_sessions request, opts
paged_enum.response
end
end
end
end
end
module EmulatorUtil
PROJECT_ID = ENV["SPANNER_PROJECT_ID"]
INSTANCE_ID = ENV["SPANNER_INSTANCE_ID"]
DATABASE_ID = ENV["SPANNER_DATABASE_ID"]
EMULATOR_HOST = ENV["SPANNER_EMULATOR_HOST"]
extend self
attr_accessor :logger
def project
@project ||= Google::Cloud::Spanner.new(project_id: PROJECT_ID, emulator_host: EMULATOR_HOST)
end
def client
project.client(INSTANCE_ID, DATABASE_ID)
end
def setup!
unless project.instance(INSTANCE_ID)
project
.create_instance(INSTANCE_ID, name: INSTANCE_ID, nodes: 1)
.wait_until_done!
end
unless project.instance(INSTANCE_ID).database(DATABASE_ID)
schema = "CREATE TABLE Customers (Id STRING(36) NOT NULL) PRIMARY KEY (Id)"
project
.instance(INSTANCE_ID)
.create_database(DATABASE_ID, statements: [schema])
.wait_until_done!
end
end
# open an empty transaction and rollback immediately on every session in the emulator
def reset_all_emulator_transactions!
project.instances.all do |instance|
puts "instance: #{id(instance)}"
instance.databases.all do |database|
puts " database: #{id(database)}"
each_session_for_database(database) do |session|
puts " resetting session: #{id(session)}"
tx = session.create_empty_transaction
session.rollback tx.transaction_id
rescue => e
puts " error resetting session: #{e.details}"
raise
end
end
end
end
# call .release! on every session in the emulator
def release_all_emulator_sessions!
project.instances.all do |instance|
puts "instance: #{id(instance)}"
instance.databases.all do |database|
puts " database: #{id(database)}"
each_session_for_database(database) do |session|
puts " releasing session: #{id(session)}"
session.release!
rescue => e
puts " error resetting session: #{e.details}"
raise
end
end
end
end
def each_session_for_database(database)
# patched method, paginated
session_result = database.service.list_sessions(database: database.path)
next_page_token = session_result.next_page_token
loop do
session_result.sessions.each do |grpc_session|
yield Google::Cloud::Spanner::Session.new(grpc_session, database.service)
end
break if next_page_token.empty?
session_result = database.service.list_sessions(database: database.path, token: next_page_token)
next_page_token = session_result.next_page_token
end
end
def id(path_haver)
path_haver.path.split("/").last(2).join("/")
end
def puts(message)
if logger
logger.debug(message)
end
end
end
export SPANNER_PROJECT_ID="example-project"
export SPANNER_INSTANCE_ID="example-instance"
export SPANNER_DATABASE_ID="example-database"
export SPANNER_EMULATOR_HOST="localhost:9010"
background_pid=
function kill_background_job {
if [ -z "$background_pid" ]; then
return
fi
kill -9 $background_pid > /dev/null 2>&1
wait $background_pid >/dev/null 2>&1
}
function echo_and_kill {
echo 'killing background job'
kill_background_job
}
trap kill_background_job EXIT
function check_for_emulator {
if ! curl -s localhost:9020/v1/projects > /dev/null
then
echo 'emulator is not running, start with:'
echo
echo ' docker run -p 9010:9010 gcr.io/cloud-spanner-emulator/emulator'
echo
exit 1
fi
}
function force_open_transaction {
echo '' > transaction-worker.log
until grep "Google::Cloud::AbortedError" transaction-worker.log > /dev/null
do
# start transaction worker in the backgroun
ruby 01_force_transaction_error.rb > transaction-worker.log 2>&1 &
background_pid=$!
# sleep long enough for any attempted transaction to reach the deadline
sleep 7
# unceremoniously kill the transaction worker
kill_background_job
done
echo "ok $1 - emulator is stuck"
}
function reset_sessions {
if ruby 02a_session_reset.rb; then
echo "ok $1 - emulator sessions reset"
else
echo "not ok $1 - emulator sessions could not be reset"
fi
}
function release_sessions {
if ruby 02b_session_release.rb; then
echo "ok $1 - emulator sessions released"
else
echo "not ok $1 - emulator sessions could not be released"
fi
}
function try_transaction {
if ruby 03_try_transaction.rb
then
echo "ok $1 - transaction succeeded"
else
echo "not ok $1 - transaction failed"
fi
}
check_for_emulator
echo '1..8'
force_open_transaction '1'
reset_sessions '2'
try_transaction '3' # this step only fails when the emulator is stuck and sessions are gone
force_open_transaction '4'
release_sessions '5'
try_transaction '6'
reset_sessions '7'
try_transaction '8'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment