Created
January 22, 2011 00:09
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
17:27 <echosystm> hi guys | |
17:27 <echosystm> i just read that "A Riak cluster is generally run on a set of | |
well-connected physical hosts" | |
17:27 <echosystm> is it unsuitable to be run on poorly connected physical hosts? | |
17:29 <echosystm> ie. hosts in different datacentres | |
17:30 <echosystm> how does it avoid split brain problems? | |
17:30 <aphyr> echosystm: They sell a replication system for relaying state | |
between clusters. | |
17:31 <aphyr> That's intended for use between datacenters. | |
17:31 <aphyr> Though if you use allow_mult cleverly, it's definitely possible | |
to solve the paritioning problem. | |
17:31 <echosystm> well, basically all i want is a step up from log shipping | |
between two sql databases | |
17:32 <echosystm> and i dont mind using a document database, because they all | |
seem to be better at distribution | |
17:32 <aphyr> You might also look into couchdb | |
17:32 <echosystm> think two databases, synchronous writes and automatic failover | |
17:32 <echosystm> i imagine for the failover, there would need to be a third | |
node or a witness of some kind? | |
17:32 <aphyr> Synchronous writes *across* the datacenter boundary? | |
17:32 <echosystm> yep | |
17:32 <aphyr> Prepare for slowness! | |
17:33 <echosystm> performance isnt an issue | |
17:33 <aphyr> I would definitely recommend couch | |
17:34 <echosystm> would that be more suitable for this use case than riak? | |
17:34 <aphyr> I don't really understand your use case fully | |
17:34 <aphyr> In Riak there is no privileged node | |
17:34 <aphyr> Hence no failover | |
17:34 <aphyr> Nodes just join and leave the cluster and rearrange data to | |
compensate. | |
17:34 <echosystm> well, i want it to be active-active, so there is no 'master' | |
17:35 <aphyr> OK. For Riak you typically run 4+ nodes in a cluster | |
17:35 <echosystm> when i say failover, what i really mean is ensuring that if | |
connectivity is lost to a database, that database knows to shut itself down | |
17:35 <aphyr> You mean that when a client disconnects the DB should shut down? | |
17:35 <aphyr> That doesn't sound like failover. | |
17:35 <echosystm> no | |
17:36 <echosystm> imagine you have two databases | |
17:36 <echosystm> the link goes down between them | |
17:36 <echosystm> who keeps running and who shuts down? | |
17:36 <aphyr> Both keep running. | |
17:36 <echosystm> thats not acceptible | |
17:36 <aphyr> I suppose you could kill every one of them. | |
17:36 <aphyr> But really dude, how do you expect to choose a privileged master | |
without destroying service? | |
17:37 <echosystm> i dont | |
17:37 <aphyr> Let me suggest an example of how Riak handles partitioning and you | |
can see if it applies to you. | |
17:37 <aphyr> There are four nodes in a cluster. | |
17:37 <echosystm> there would need to be a third node to help reach some | |
consensus on which node should turn | |
off | |
17:37 <aphyr> A partition occurs and splits the cluster into two 2-node segments. | |
17:38 <echosystm> ie. the node should turn itself off if it cant reach the other | |
two or if it is specifically told to by another node | |
17:38 <aphyr> Those nodes continue to serve requests, both reads and writes, | |
as normal. | |
17:38 <aphyr> All data is accesible in both partitions assuming your durability | |
parameters are tuned correctly. | |
17:38 <aphyr> When the partition ends the nodes rejoin each other. | |
17:38 <aphyr> They then resolve conflicts in one of two ways | |
17:38 <aphyr> 1. By last-write wins | |
17:38 <aphyr> 2. By allow-mult. | |
17:39 <aphyr> In the case of allow-mult, all written versions are stored, | |
and returned to the client on read. | |
17:39 <aphyr> The client is then responsible for negotiating the merge. | |
17:39 <echosystm> yeah, that is too complicated | |
17:39 <aphyr> Riak is designed for high availability. | |
17:39 <echosystm> if you get a split brain problem like that, i just want one shut down | |
17:39 <aphyr> You can, if you like, devise a system to do that. | |
17:40 <aphyr> But consider, first, that on short time scales this is | |
*always* occuring in a cluster. | |
17:40 <jdmaturen> there is no way to stop nodes from crashing and networks | |
from partitioning | |
17:41 <aphyr> It sounds like you might be more interested in a synchronous | |
database with directed replication | |
to a hot standby. | |
17:41 <echosystm> probably | |
17:41 <aphyr> In which case couchdb or any of the big RDBMS's might be good candidates. | |
17:42 <aphyr> Riak is aimed more at high availability; it sounds like you | |
actually want your system to fail. | |
17:42 <aphyr> You might also look into using something like Heartbeat to | |
handle your failover. | |
17:42 <echosystm> well, i dont want the state of the application to get all | |
messed up because it has been | |
partitioned | |
17:43 <aphyr> You have two realistic choices: more complicated algorithms to | |
handle concurrent modification/paritioning or having the cluster fail. | |
17:43 <echosystm> unless i'm missing something, clients on partition A are | |
going to be doing all kinds of things based on what they see there, while clients | |
on partition B are doing that also | |
17:44 <echosystm> when you merge the partitions back together, its all going to get messedip | |
17:44 <aphyr> That is really an application problem. | |
17:44 <echosystm> *messed up | |
17:44 <aphyr> I'm building a nontrivial system in Riak right now; concurrent | |
writes and partitions are a part of my test suite. | |
17:44 <aphyr> It's definitely possible to handle. | |
17:44 <jdmaturen> http://blog.basho.com/2010/01/29/why-vector-clocks-are-easy/ may be of use | |
17:46 <aphyr> There are some situations for which vector clock merges as in | |
Riak are unweildy; unique ID generation being one of them. I think most | |
people have found it worthwhile to use a hybrid approach where Riak handles | |
their mergeable persistent data, and some small locking service handles synchronization. | |
17:46 <echosystm> i think this is all far too overkill for my purposes | |
17:47 <aphyr> Probably. Tell you what: go look at the couchdb replication docs. If | |
that's not what you like, and mysql hot standbys aren't either, take a look at vector clocks. | |
17:47 <echosystm> ok | |
17:47 <echosystm> will | |
17:47 <echosystm> *will do | |
17:47 <echosystm> thanks for your help |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment