Skip to content

Instantly share code, notes, and snippets.

@Qard
Created November 22, 2013 05:13
Show Gist options
  • Save Qard/7595184 to your computer and use it in GitHub Desktop.
Save Qard/7595184 to your computer and use it in GitHub Desktop.
Ringo - a self-healing, distributed storage engine built in Go, using a neighbor-coordinated ring network topology

Ringo

Ringo is a self-healing, distributed storage engine built in Go, using a neighbor-coordinated ring network topology. It is a simple key/value storage engine that more sophisticated database interfaces can layer on top of.

What is a neighbor-coordinated ring network topology?

You might be familiar with the typical ring network topology. This is a network structure where each node in a cluster has a connection to a left node and a right node. These connections form a ring.

The problem with a typical ring topology is that, if a link in the ring is broken, messages may not reach the destination. To get around this, some have created a reversible ring where any failure to send from one node to another will reverse the direction of the message and eventually it should reach the destination by looping back around the other side. This can result in terrible latency, but more importantly, it does not self-heal. If another link were to be broken, the cluster will be severed in two.

To solve this problem, an evolution of the ring network topology is needed. In addition to maintaining a connection between the left and right nodes, a neighbor-coordinated ring network maintains connections to the next node beyond each of it's immediate neighbors. If a link is broken, messages can be sent through this secondary channel while a reconnection is attempted. In addition, the node will ask the further neighbor if they have also lost connection to the node. If so, the second level neighbors are promoted to first level neighbors and open new second level neighbor links one node beyond.

When a new node is added, it is inserted between two existing immediate neighbors. It forms a link to each and notifies them to move their first-level neighbor links over to it, and their second-level neighbor links to each other.

What does this mean for my data?

In Ringo, key/value pairs are distributed evenly across nodes. A lookup service in the balancer tracks the node that owns each key. When creation or modification of a key occurs, a master node is elected for that piece of data. The data is sent to that node and the node marks itself as a master of that data. In addition, the data is sent to the left neighbor node and marked as slave data.

Whenever a link in the network is broken and the self-healing of the links occurs, the left node will give the new right neighbor the full list of it's slave data and the new right neighbor will mark itself as the new master. At the same time, it will share it's master list, before the break, and replicate to the new left neighbor as slave data. This ensures zero data loss when a node leaves the network. When a node joins the network, it will claim half the slave list and half the master list of the new neighbors.

This method stripes data evenly across the ring, which allows scaling to massive datasets, and it also ensures durability of the data.

Potential issues

The obvious one is ACID compliance.

Atomicity is difficult to achieve if the records being written are distributed across many nodes. A coordinator service in the balancer could allow writes to be queued and committed or rolled back, if any failures occurred in the network. It's a problem I want to find a solution for, but the priority is on getting the ring to work first.

Consistency and isolation are also difficult to achieve. To support safe constraint rollbacks, you need both consistency and isolation. This may be achievable in an append-only storage model with revision tracking and queued/committed flags on each revision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment