Skip to content

Instantly share code, notes, and snippets.

@arielshaqed
Last active March 16, 2021 15:30
Show Gist options
  • Save arielshaqed/fa2cc601e8445ee7a933e3a83df8ea8e to your computer and use it in GitHub Desktop.
Save arielshaqed/fa2cc601e8445ee7a933e3a83df8ea8e to your computer and use it in GitHub Desktop.
Description of concurrent operations by underlying object store (mostly S3) in lakeFS alternatives

How concurrent are lakeFS alternatives?

DeltaLake

DeltaLake concurrency uses LogStore. On S3 it is single-writer concurrency (a writer is a Spark cluster):

Delta Lake supports concurrent reads from multiple clusters, but concurrent writes to S3 must originate from a single Spark driver in order for Delta Lake to provide transactional guarantees.

Nessie

Nessie supports (or tries to, depending on the underlying system?) these three isolation levels:

  • Read committed (Iceberg if refresh is allowed, Delta);
  • Repeated read (delegating somehow through Hive's "HMS Bridge", or Iceberg if avoiding refreshes)
  • Serializable (no examples given)

Documentation about the Nessie commit kernel says:

Nessie’s production commit kernel is optimized to provide high commit throughput against a distributed key value store that provides record-level ACID guarantees. Today, this kernel is built on top of DynamoDB. The commit kernel is the heart of Nessie’s oeprations and enables it to provide lightweight creation of new tags and branches, merges, rebases all with a very high concurrent commit rate.

And everything goes through the DynamoDB refs table:

The refs table will have objects equal to the current number of active tags and branches. This will generally be small (10s-1000s). All commits run through this table and thus the writes and reads of this table should be provisioned based on the amount of read and write operations expected per second. Since the dataset is small, sharding will be unlikely to happen on this table. Scans are regularly done on this table.

The design goal is modest: 300 writes/second.

Nessie-on-DeltaLake

Odd...

Nessie is able to interact with Delta Lake by implementing a custom version of Delta's LogStore interface. This ensures that all filesystem changes are recorded by Nessie as commits. The benefit of this approach is the core ACID primitives are handled by Nessie. The limitations around concurrency that Delta would normally have are removed, any number of readers and writers can simultaneously interact with a Nessie managed Delta Lake table.

Iceberg

Best (recent) summary is on this issue from 2020-09.

Using Iceberg requires a catalog that can swap a pointer to the metadata file atomically. This can be done using a compare and swap or lock/unlock API.

They work with a Hive metastore, and "[a]nyone could easily build an integration for any catalog". Nessie show up on that issue and mention that they can do it for you... with DynamoDB.

There's PR 1608 which adds support for "Glue". But it also "uses DynamoDB for the locking support missing in Glue". It seems to have gone in recently, no idea if it has already been released. A lock manager on top of DynamoDB is in #2034.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment