jpillora/peer-data-sync.md

## peer-data-sync.md

      
    Raw
  

              peer-data-sync.md
            
          
    Peer Data Synchronisation

One-way data synchronisation is trivial. Forward all mutating operations to your slave store to also perform them. Master-Slave replication complete.
Two-way data synchronisation - on the other hand - is hard. It involves conflict resolution, varying of complexity, from choosing the newest operation, most relavant operation, or using operational transforms. All of these require vaguely accurate time synchronisation which then produce another set of problems, solvable with comparing timers, using a shared time server, or using atomic clocks.
The current problem

Currently, I'm trying to synchronise session data across multiple authentication servers. I've added to the complexity of the problem by assuming that each node is transient and may only exist for a limited amount of time (i.e. AWS deciding they need to unplug my instance). Also, the connections between each node may fail.
The current solutions

The most common solution to this is Redis. Then possibly MongoDB. Redis would be the safest choice right now, it's stable and many people use it in production. By using Redis or Mongo however, we're adding an extra piece into an already complex puzzle. I think it'd be much simpler if the applications themselves could simply share a small chunk of memory?
The proposition

A node module which runs on a set of node applications, which will open a server and also become a client to each connected application, creating a network of peers, allowing them to synchronise shared "buckets" of data between them.
Goals


Maintain high performance
Allow arbitrary "backend" data-stores
Allow peers to join the network at any time and retrieve all data to date
Allow peers to ask for partial chunks of data from another specific peer

Here's what I've got so far: https://github.com/jpillora/node-peer-store though, definitely not production ready.
Questions

To the reader, I've got a few problems that I'm facing, please give your two-cents in the comments. Questions numbered for easy referencing :)

What are the bottlenecks and possible ways to solve them ?

With what I've currently build and tested, in one case, I found that a set of local servers (not a great test!), performs best at about 1.2k operations/sec. When you ramp it up though, it dives down into the 400-500 region. It would be nice to be able to detect where the bottleneck is and then attempt to balance system resources. Maybe, using compression to trade CPU for Network bandwidth, or using an in-memory data-store vs disk data-store for faster access for less capacity?

In this case, what's the best way resolve conflicts?

I'm currently attempting to synchronise time between each node (using system clocks, correcting for network latency), then when new data comes in, times are compared and the newer one wins.


Follow on question, what's the best way to synchronise time (while not being Google)?


When a new peer joins, during high volume, how do you splice them into the data flow efficiently and effectively?


I've drawn an algorithm for this though haven't implemented it. Basically, it's: join, grab all existing data, broadcast readiness, accept new data. However I'm going for 100% synchronisation and there are many gaps for packets to split by.

When a peer drops for a short time then reconnects, it will be missing some data, how do we retrieve only what's missing?

Currently, I'm storing a giant history array of all operations (local and remote), with time, operation and key, so at the moment, I can query everything between a given time period. The aforementioned algorithm will change this though, history will only store local operations, remote operations are just confirmed with a sequence number. So receiving a seq10 when you're expecting seq8 means, you missed seq8 and seq9. Will allow queries by sequence number.


I’m slightly stealing some of the nice ideas from the SLEEP specification, about using a History table to essentially allow for revisions, is there a better way?


I'd like any "backend" data-store behind this module, however each will come with different use cases, thoughts?


Scaling up. How do Google and Facebook solve these problems? So many uses hitting the same data, they have redundancy and replication, but how? Can it be applied here?


I'm guessing some form of eventual consistency... Currently, my set() method only calls back when it gets confirmation from all peers. Having more than 3-4 peers in a network would become very chatty, however, I'm sure I could do better somehow.