a-deal/bradfield-csi-dynamo-prompt.md

## bradfield-csi-dynamo-prompt.md

      
    Raw
  

              bradfield-csi-dynamo-prompt.md
            
          
    What are the parameters R, W, and N?


R | Minimum number of nodes that must participate in a successful read operation
W | Minimum number of nodes that must participate in a successful write operation
N | Number of replicas nodes that retain copies of a system's data (or partition thereof)

How many node failures can the system tolerate while still guaranteeing that read operations will return up-to-date values?


Given that R is the minimum number of replicas to be read from for a read operation to be successful, it
stands that N, where N equals the number of healthy replica nodes, must be at least equal to R for an up-to-date
read operation to succeed. Failures can affect up until where the remaining healthy nodes is at least R.

What’s ”eventual consistency”?

Describes a system where updates to it's persistent state are eventually copied to all replica nodes (N).
This implies that there is a window where not all replicas contain a particular update.
What are some advantages and disadvantages?


Advantages include:

Consumers can access a data store with very little concern of downtime
(in the case of Dynamo, at least 99.9% of the time)
Eventually consistent systems can sustain network failures or outages of some
if not most of it's replica nodes while maintaining consumer access.
By comparison, strong consistency guarantees would fail an update if any or
even a fraction of a system's replica nodes are unavailable.


Disadvantages include:

The need to resolve conflicts among read operations that yield different,
potentially  at odds, states for the same item.


What’s a specific example of behavior that could happen under this consistency model?


Amazon's shopping cart service, exemplified in the text, demonstrates that under it's eventually
consistent model customer carts occasionally include stale data (deleted items). As
conflicts surface between two different states of the same get operation, the preference is to merge
these conflicted states under the tradeoff that it is better to include all writes (regardless of when they
happened) at the risk of including (understood as a fraction) of bad reads. Conflict resolution is pushed
to the application layer where the context is richest as to how that particular data is used.

Why does ”consistent hashing” (compared to, say, hash(key) % N) reduce the amount of data that needs to be moved when a new node comes online?


Computing a hash's modulo relative to the number of available replica nodes involves
a significant amount of work to redistribute data when a either (a) a node is removed,
(b) a node is added, or (c) a node is swapped out for another. In both (a) and (b),
all stored items need to be rehashed and reassigned as addition/removal changes the
bound imposed by the modulo operator. If this reorganization doesn't occur, certain
read/write operations will hash to a position unserviced by a node (in the case of removal)
or by a different node (in the case of addition). As for (c), rehashing will only be necessary
to copy data from the node removed to that of its successor. All cases present an
undesirable caveat to a system that assumes fault tolerance (i.e. some nodes being unavailable)
invariably resulting in suffered performance.
By contrast, consistent hashing does away with the modulo approach's dependency on the number
of replica nodes. To do this, consistent hashing produces a hash ring that abstractly demarcates
'zones' of responsibility along it's circumference. Each replica node is assigned a zone and each data
hash corresponds to a radian along the circle. Intuitively, if a data item's key is hashed to a
node's zone - it is responsible for that data item. If that a particular node is removed, only
the keys hashed to its zone will need to be remapped to the remaining servers while their own
keys maintain their respective zones. The addition of replica node resolves in a similar fashion.
The total amount work involved remapping is considerably reduced as a result.