Skip to content

Instantly share code, notes, and snippets.

@moonpolysoft
Created July 30, 2009 04:45
Show Gist options
  • Save moonpolysoft/158558 to your computer and use it in GitHub Desktop.
Save moonpolysoft/158558 to your computer and use it in GitHub Desktop.
rev2 bootstrap procedure
* X new nodes come up and optionally join an existing distributed erlang cluster. they DO NOT issue dynomite join commands.
* When a new node comes up it starts in single mode with a number of local storage servers and is ready to serve requests unless it has existing membership/partition state from disk
* Once all of the new nodes that need to be merged are up and running an administrator or script must issue a request to the admin API enumerating the nodes which must be joined
* The admin API returns a token that can be used to check the status of the bootstrap process, or an outright failure if any member node is unreachable at the time bootstrap starts
* An ad-hoc master process is started up, which helps to monitor and coordinate the bootstrap process
* Each node's membership server will receive a join command with the new member node list and partition mapping
* the individual servers will all compute listings of partitions gained and lost across the cluster
* if a node is losing a master partition (first in the preference list) it will start a bootstrap sender primed to accept a set number of receivers
* the sender (as well as receivers) will all register with the ad-hoc master for failure detection (open question: how well will this scale to a massive node count?)
* senders will timeout if the requisite number of receivers do not ping and request data
* the ad-hoc master(s?) will monitor for failures and other errors. if failure happens it will provide atomicity at the level of a partition (across the nodes past and present responsible for that partition)
* for each partition, once the transfer is completed and verified to all of the receiving replicas, the ad-hoc master will start a storage server on that node for the new replica
* once the new storage servers are running, a sync process will go through and run each one against an old storage server that was running during the bootstrap. this will bring it up to speed with any writes that happened during the potentially long bootstrap period.
* each of the old storage servers is then shut down one at a time. after each shutdown the data on the disk is deleted to make room available. auto-deletion should be configurable on or off.
* the ad-hoc master will then make a note that everything happened correctly
* once every bootstrap operation has been verified as completed (either successfully or unsuccessfully) the ad-hoc master will wait for an admin request polling the ending status, and report what happened, including any bootstraps that failed to complete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment