Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created January 22, 2011 00:09
17:27 <echosystm> hi guys
17:27 <echosystm> i just read that "A Riak cluster is generally run on a set of
well-connected physical hosts"
17:27 <echosystm> is it unsuitable to be run on poorly connected physical hosts?
17:29 <echosystm> ie. hosts in different datacentres
17:30 <echosystm> how does it avoid split brain problems?
17:30 <aphyr> echosystm: They sell a replication system for relaying state
between clusters.
17:31 <aphyr> That's intended for use between datacenters.
17:31 <aphyr> Though if you use allow_mult cleverly, it's definitely possible
to solve the paritioning problem.
17:31 <echosystm> well, basically all i want is a step up from log shipping
between two sql databases
17:32 <echosystm> and i dont mind using a document database, because they all
seem to be better at distribution
17:32 <aphyr> You might also look into couchdb
17:32 <echosystm> think two databases, synchronous writes and automatic failover
17:32 <echosystm> i imagine for the failover, there would need to be a third
node or a witness of some kind?
17:32 <aphyr> Synchronous writes *across* the datacenter boundary?
17:32 <echosystm> yep
17:32 <aphyr> Prepare for slowness!
17:33 <echosystm> performance isnt an issue
17:33 <aphyr> I would definitely recommend couch
17:34 <echosystm> would that be more suitable for this use case than riak?
17:34 <aphyr> I don't really understand your use case fully
17:34 <aphyr> In Riak there is no privileged node
17:34 <aphyr> Hence no failover
17:34 <aphyr> Nodes just join and leave the cluster and rearrange data to
compensate.
17:34 <echosystm> well, i want it to be active-active, so there is no 'master'
17:35 <aphyr> OK. For Riak you typically run 4+ nodes in a cluster
17:35 <echosystm> when i say failover, what i really mean is ensuring that if
connectivity is lost to a database, that database knows to shut itself down
17:35 <aphyr> You mean that when a client disconnects the DB should shut down?
17:35 <aphyr> That doesn't sound like failover.
17:35 <echosystm> no
17:36 <echosystm> imagine you have two databases
17:36 <echosystm> the link goes down between them
17:36 <echosystm> who keeps running and who shuts down?
17:36 <aphyr> Both keep running.
17:36 <echosystm> thats not acceptible
17:36 <aphyr> I suppose you could kill every one of them.
17:36 <aphyr> But really dude, how do you expect to choose a privileged master
without destroying service?
17:37 <echosystm> i dont
17:37 <aphyr> Let me suggest an example of how Riak handles partitioning and you
can see if it applies to you.
17:37 <aphyr> There are four nodes in a cluster.
17:37 <echosystm> there would need to be a third node to help reach some
consensus on which node should turn
off
17:37 <aphyr> A partition occurs and splits the cluster into two 2-node segments.
17:38 <echosystm> ie. the node should turn itself off if it cant reach the other
two or if it is specifically told to by another node
17:38 <aphyr> Those nodes continue to serve requests, both reads and writes,
as normal.
17:38 <aphyr> All data is accesible in both partitions assuming your durability
parameters are tuned correctly.
17:38 <aphyr> When the partition ends the nodes rejoin each other.
17:38 <aphyr> They then resolve conflicts in one of two ways
17:38 <aphyr> 1. By last-write wins
17:38 <aphyr> 2. By allow-mult.
17:39 <aphyr> In the case of allow-mult, all written versions are stored,
and returned to the client on read.
17:39 <aphyr> The client is then responsible for negotiating the merge.
17:39 <echosystm> yeah, that is too complicated
17:39 <aphyr> Riak is designed for high availability.
17:39 <echosystm> if you get a split brain problem like that, i just want one shut down
17:39 <aphyr> You can, if you like, devise a system to do that.
17:40 <aphyr> But consider, first, that on short time scales this is
*always* occuring in a cluster.
17:40 <jdmaturen> there is no way to stop nodes from crashing and networks
from partitioning
17:41 <aphyr> It sounds like you might be more interested in a synchronous
database with directed replication
to a hot standby.
17:41 <echosystm> probably
17:41 <aphyr> In which case couchdb or any of the big RDBMS's might be good candidates.
17:42 <aphyr> Riak is aimed more at high availability; it sounds like you
actually want your system to fail.
17:42 <aphyr> You might also look into using something like Heartbeat to
handle your failover.
17:42 <echosystm> well, i dont want the state of the application to get all
messed up because it has been
partitioned
17:43 <aphyr> You have two realistic choices: more complicated algorithms to
handle concurrent modification/paritioning or having the cluster fail.
17:43 <echosystm> unless i'm missing something, clients on partition A are
going to be doing all kinds of things based on what they see there, while clients
on partition B are doing that also
17:44 <echosystm> when you merge the partitions back together, its all going to get messedip
17:44 <aphyr> That is really an application problem.
17:44 <echosystm> *messed up
17:44 <aphyr> I'm building a nontrivial system in Riak right now; concurrent
writes and partitions are a part of my test suite.
17:44 <aphyr> It's definitely possible to handle.
17:44 <jdmaturen> http://blog.basho.com/2010/01/29/why-vector-clocks-are-easy/ may be of use
17:46 <aphyr> There are some situations for which vector clock merges as in
Riak are unweildy; unique ID generation being one of them. I think most
people have found it worthwhile to use a hybrid approach where Riak handles
their mergeable persistent data, and some small locking service handles synchronization.
17:46 <echosystm> i think this is all far too overkill for my purposes
17:47 <aphyr> Probably. Tell you what: go look at the couchdb replication docs. If
that's not what you like, and mysql hot standbys aren't either, take a look at vector clocks.
17:47 <echosystm> ok
17:47 <echosystm> will
17:47 <echosystm> *will do
17:47 <echosystm> thanks for your help
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment