Distributed Locking With DynamoDB
This article unpacks information supporting DynamoDB as an excellent choice for a distributed locking service while briefly exploring the what, why & how of locking.
Why DynamoDB is a good choice for locking
To build a lock, all you need is access to a file system. To build a distributed lock, all you need is network access to the file system. To build a highly available, distributed lock, all you need is a highly available, network reslient, durable filesystem. It is for this reason that your file system choice becomes difficult. There are several popular services that will satisfy one or all of the aforementioned scenarios:
- Doozerd
- DynamoDB
- MySQL
- PostgreSQL
- Redis
- Zookeeper
The data stores in the list are far from equal. Their feature sets are as varied as the integers. However, what makes DynamoDB stand out is the fact that it is highly available and easily accessible. You can setup a highly available, distributed lock service in minutes. DynamoDB offers a simple set of primitives that make distributed locking easy.
Create item unless exists
Delete item
Since DynamoDB offers conditional writes, we can attempt to lock an item by creating the item unless an item already exists. Once we are done with the lock, we delete it.
Gotchas
When using DynamoDB, we trade lock expiration robustness for ease of use. This tradeoff is manifested when we consider what would happen if a processor acquires a lock, then is partitioned from DynamoDB. What is the processor to do? The process of detecting and deleting stale locks is a difficult problem. Solutions like Doozerd & Zookeeper use complicated consensus algorithms to agree if a lock has been abounded before deleting. However, there are plenty of use cases which do not require this level of sophistication.
One work around is to set a TTL on the lock item in DynamoDB. Subsequent attempts to acquire the lock will check the TTL and preempt an abandoned lock. This implies that the process that acquired the lock needs to terminate execution past the TTL. This can be achieved by wrapping your critical section in a timeout.
How you can start using DynamoDB as a lock
The recipe for using DynamoDB as a locking service is quite simple, you do not need anything but and HTTP client to take full advantage of locking on top of DynamoDB. The basic algorithm is:
Prune expired lock. (see Gotchas)
Create Item unless exists
Yield to critical section inside of timeout
Delete Item
If you are looking for a codified recepie, the following libraries exist:
- Ruby - lock-smith
- Go - ddbsync
What's a lock
Locking is a another word for incorporating mutual exclusion into a set of parallel executing processes. Mutual exclusion was discovered by E. W. Dijkstra and summarized as: The problem of ensuring that no two processes or threads can be in their critical section at the same time. The critical section being any section of code that assume no other code is in the same section at the same instant.
For instance:
x = get_global_count
x = x + 1
Imagine if 2 processes were accessing the previous code at the same instant, the CPU might allow the 1st process to execute the first line, then pausing the 1st process to allow the 2nd process to execute the first line. It is now impossible for each process to accurately compute the global count. Thus we are motivated to build a system in which the 1st process can lock the code so it can execute both lines with the confidence that no other process is manipulating the global count.
lock
x = get_global_count
x = x + 1
unlock
Why you would use a lock
The previous example demonstrated a classic scenario which requires local locking. DynamoDB is not the best choice for that type of locking; it is likely that your progamming language offers locking APIs that are much more suitable. However, there are a plethora of good examples for which DynamoDB is an excellent choice for locking. Automated DNS configuration is one such example:
lock("herokuapp.com.")
dns_records = get_records
if !includes(dns_records, "foo.herokuapp.com.") {
create_record("foo.herokuapp.com.")
}
unlock("herokuapp.com.")
In this example, we have a process that is updating DNS records. Presumably, some other service is providing "foo.herokuapp.com." as an input to the process. Thus we have at least 3 network-indepoendent processes at play. In addition to this example, other scenarios include process partitioning, failing over a redis master, aggregating data accross shards, etc...
Conclusion
DynamoDB is not the most advanced locking service. However, DynamoDB offers a lot of features at a great price. Most importantly, DynamoDB is offered as an Amazon service. This allows the engineer not to have to maintain a cluster of Zookeepers or Doozerds. Whenever we can trade a feature for a dramatic reduction in operational burden, we should.
Updates
- 2012-10-21 - Inception
Author
Ryan Smith
@ryandotsmith, ace hacker, builds distributed systems at heroku.
This article was motivated by many success and failures experienced with production systems at Heroku.
I apologize for my naiveness, but I'm having trouble thinking of concrete scenarios where a locking service would be required. The DNS example given above seems like it could be solved in other ways. For example, a distinct queue of changes to the domain could be applied by multiple workers at any concurrency. Am I missing something there?
However, assuming that I do need a locking service you make a very strong case for DynamoDB.