Skip to content

Instantly share code, notes, and snippets.

@no-reply
Last active June 4, 2016 22:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save no-reply/9e9fab9c9eb829cc08ef57d83e49e93d to your computer and use it in GitHub Desktop.
Save no-reply/9e9fab9c9eb829cc08ef57d83e49e93d to your computer and use it in GitHub Desktop.
HAMT RDF::Repository Thread Safety

Threading

While our new RDF::Repository implementation theoretically improves the concurrency story for RDF.rb, it isn't, in itself, thread safe. The underlying data representation may be purely functional, but the Repository itself is swimming in shared mutable state. Specifically, we have the potential for a data race during execution of code like @data = data; and, more generally, for race conditions wherever our changes depend on previous reads. Notably, this affects #transaction, as demonstrated in the following snippet:

require 'rdf'
repo = RDF::Repository.new

threads = []
err_count = 0

# make 10 threads, processing 1000 transactions each
10.times do |n|
  threads << Thread.new do
    1_000.times do |i|
      begin
        repo.transaction(mutable: true) do
          # insert a unique statement for each transaction
          insert RDF::Statement("thread_#{n}".to_sym,
                                RDF::URI('http://example.com/num'),
                                i)
        end
      rescue RDF::Transaction::TransactionError
        # count up the statements that fail in execution
        err_count += 1
      end
    end
  end
end

threads.each(&:join)

# not even close to 10_000!
repo.count + err_count # => 5587

(Running this in your environment is may yield different results. You may even see expected results. Nevertheless, trust me, this code is not safe.)

The good news that races are reasonably isolated. Any dreams of perfectly asynchonous concurrency are dashed, but the need for synchonization is minimized. For transactions, we need only synchonize #execute; in place of the transaction block above, we have:

# ...
begin
  tx = repo.transaction(mutable: true)
  tx.insert RDF::Statement("thread_#{n}".to_sym,
                           RDF::URI('http://example.com/num'),
                           i)
  mutex.synchronize { tx.execute }
rescue RDF::Transaction::TransactionError
# ...

Still, as an implementation-specific solution, this leaves something to be desired. Giving more thought to thread safety will likely uncover better options.

@no-reply
Copy link
Author

no-reply commented Jun 4, 2016

A proof of concept for a thread safe transaction is up at ruby-rdf/rdf@c8080e2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment