Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created May 28, 2010 15:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/417326 to your computer and use it in GitHub Desktop.
Save PharkMillups/417326 to your computer and use it in GitHub Desktop.
florian # hi guys
seancribbs # howdy
florian # i'm an engineer at Twitter and have started to look into Riak a little bit
benblack # florian: do you work with rk?
florian # what's the largest cluster that you guys now that's currently running?
seancribbs # 60 nodes is the largest we know of
florian # greeat how is churn handled in riak?
benblack # churn?
seancribbs # what specifically about churn?
florian # i.e. if you have a cluster that's setup for multi-tenancy
seancribbs # you mean like concurrent updates?
benblack # set up for multi-tenancy in what way?
florian # nope more like nodes joining and leaving pretty regular
seancribbs # vector clocks with sibling/divergent objects oh - we don't recommend you rapidly changing cluster sizes
benblack # node joining and leaving should be infrequent events
seancribbs # causes too much handoff
florian # could you elaborate on how this is handled?
benblack # all dynamo-like systems dislike cluster membership churn
benblack # florian: have you read the amazon dynamo paper?
florian # yes
benblack # that's how it is handled.
florian # i see
benblack # florian: do you work with rk or jmhodges?
florian # they're both on the infrastructure team but yeah somewhat this is a one-off project
seancribbs # florian: basically each leaving node is going to have to handoff ownership of partitions to other nodes
and the inverse for joining nodes
benblack # which means lots of data transfer and cleanup
benblack # membership changes are very expensive
seancribbs # too much leaving/joining may also cause aggressive reshuffling which hurts
florian # so i've worked with coherence in the past and have been burned by their "dealing" with nodes joining / leaving
i.e. with coherence, if you network was interrupted for just a bit and then reconnected, the shuffling around / re-balancing the keyspace would often take down the entire cluster
benblack # florian: do you mean permanent leaving of nodes or just temporary?
arg # the overhead comes when nodes *permanently* join/leave the network
florian # temporarily
arg # partitions don't cause much overhead at all
benblack # right, then that's easier
seancribbs # just hinted handoff which is ez
arg # reads/writes will be handled by other nodes and any written data will be handed off when the node rejoins
^^
florian # would someone explain to me the algorithm used to a) decide when a partition is handed off, b) how it is handed off ?
benblack # florian: the partition is not handed off
benblack # in case of temporary failure the mutations to a partition are given to other nodes
"hints"
arg # each partition is represented by a process that periodically checks to see if it is "home", i.e. it owns the partition in the consistent hashing state
benblack # hence the name hinted handoff
seancribbs # partition (network failure) != partition (subdivision of the keyspace)
arg # if it doesn't own it, it either means membership changed or it has taken some handoff data
benblack # when the node rejoins, the hints are handed off
* arg # stands down
benblack # good point, seancribbs
seancribbs # that may have been confusing
* arg # stands up
seancribbs # we usually mean the keyspace one when we say partition
florian # ok cool.
benblack # a network partition is the thing that can cause hints to be stored on other nodes for later handoff when the partition is healed
florian # so what will happen if i have a cluster with 6 nodes, 2 off them are temporarily disconnected (let's say for 10 minutes), then they come back
arg # other nodes will take writes and answer reads
when the nodes come back, the nodes that took the writes will see them come back
benblack # florian: a network partition is the thing that can cause hints to be stored on other nodes for later handoff when the partition is healed
arg # and hand back the data they accepted for them while they were down/partitioned
* seancribbs # steps away since arg and benblack have this covered
florian # so does it generally work pretty well across datacenters?
benblack # florian: the dynamo paper really does cover this
florian # benblack: i think that's also what coherence would say :)
seancribbs # florian: we generally consider a cluster to be within a single DC
florian # this is awesome info
benblack # coherence would say it is covered in the dynamo paper?
arg # florian: in the enterprise product, yes. currently multi-datacenter is a for-pay feature but that might not always be the case.
florian # ok thanks guys
roidrage # indeed thanks! it's like interactive learning reading through this conversation :)
seancribbs # roidrage: Gradus ad Parnassum
arg # but to actually answer the multi-datacenter question, it is handled well each object is tagged with a vector clock that is used to detect/expose conflicting/non-causally-related writes in which case you can either get all the conflicting objects or choose to let the latest timestamp win for a given bucket/key pair
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment