Skip to content

Instantly share code, notes, and snippets.

@manuel
Last active February 19, 2024 17:47
Show Gist options
  • Save manuel/1c35175728eda43b82e8493d6f3ff636 to your computer and use it in GitHub Desktop.
Save manuel/1c35175728eda43b82e8493d6f3ff636 to your computer and use it in GitHub Desktop.
Transactional RPC across heterogenous data stores and facades

An idea for how to do transactions across heterogenous data stores (e.g. key-value (KV) stores and Git repositories) in an RPC system. Probably not novel.

Overview

There are primitive data stores, like KV stores and Git repositories.

There are facades, processes, that enforce some invariant on top of another, underlying data store. Example: a facade that only allows storing numbers as values in a KV store.

Let’s use the term storage process to refer to both primitive stores and facades.

Storage processes can be manipulated via a RESTful protocol. E.g. multiple KV mappings may be changed with a PUT or PATCH request, and a ref (branch) in a Git repo may be changed with PUT.

A facade may implement any higher-level protocol (e.g. an "increase" and "decrease" operation for numbers), that gets translated to a primitive message for the underlying primitive store. However, a facade may also just expose the same protocol as the underlying primitive store, and reject some messages that would violate invariants for example.

All primitive stores use the same underlying transactional storage substrate. E.g. in the browser, all primitive stores would use a single IndexedDB database.

Transaction processing

To perform a transaction across multiple storage processes (primitive or facade), a process sends a multi-request to the kernel. The multi-request contains a sub-request for each storage process that should be changed in the transaction. A sub-request contains a change in the protocol understood by the storage process in question.

The kernel sends each sub-request to the targeted storage process in a "prepare" phase. The storage process can look at the request, and reject it by replying with an error status code. Or, if it accepts the request it may answer with an "edit script". The edit script tells the kernel how to achieve the transaction results.

The edit script is always in terms of the primitive store in case of a facade. E.g. a facade that supports an increase/decrease operation returns an edit script that contains a key PUT for the underlying primitive KV store.

(In case where the facade supports the same protocol as the underlying primitive store, the sub-request itself may serve as the edit script, and the facade can simply tell the kernel to use it as-is.)

Once the kernel has received a go-ahead and the corresponding edit scripts from all targeted storage processes, it performs the edit scripts in a single transaction across the primitive stores in the storage substrate. Either all fail, or all succeed.

Optimistic concurrency control

Sub-requests will typically include a compare-and-swap like instruction, effectively expressing "only perform this change if the version of the data store is $VERSION".

Another way of looking at this: it is effectively equivalent to a conditional request in HTTP with an If-Match header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Match

This enables a sort of optimistic concurrency control.

Optimistic concurrency control may also be used in an additional way: a sub-request of a transaction may express "only perform the whole transaction if data store $FOO still has current version $VERSION". IOW, there may be sub-requests for any number of storage processes that are not changed by the transaction, but whose state/version is included as a correctness criterion. This enables a process to only perform a transaction if all data stores it has read are still at the correct version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment