Created
October 6, 2010 17:31
-
-
Save PharkMillups/613732 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16:08 <grantr> hey guys me again | |
16:08 <grantr> i've been thinking about how to implement a queue in riak | |
(or any similar kv store) | |
16:08 <grantr> any advice on that? | |
16:09 <jdmaturen> afaict there are no great ways to concurrently read/write a | |
given key [queue]. /me waves hands at all the MQs out there | |
16:09 <grantr> trying to figure out how to structure it so an add or remove can be a single write | |
16:10 <benblack> with an unmodified store, you can't. | |
16:10 <grantr> ah | |
16:11 <benblack> you could do something where you just wrote stuff in, then | |
fixed it up on read | |
16:12 <benblack> but it would not have very consistent performance | |
16:13 <grantr> you'd have to do a lot of list_keys or mapred | |
16:13 <benblack> no | |
16:13 <benblack> you'd allow_multi, retrieve all siblings on read, and write | |
back an ordered set of linked documents | |
16:13 <benblack> or similar | |
16:14 <grantr> oh i never thought of that | |
16:14 <benblack> you'd be better off writing something for the purpose on | |
riak_core, not on _kv | |
16:14 <grantr> yeah i had a crazy idea | |
16:14 <grantr> seems like it would be possible (though maybe not prudent) to wrap | |
rabbit's queue implementation in riak core | |
16:15 <grantr> you would probably lose strict ordering | |
16:15 <benblack> been discussed. | |
16:16 <grantr> any conclusions? | |
16:16 <benblack> yes, it's hard. | |
16:16 <grantr> damn ;) | |
16:16 <grantr> hard to maintain queue semantics? | |
16:18 <grantr> on the face of it, it seems relatively easy - routing table maintained by the | |
ring, each vnode is assigned a queue or some shard of a queue | |
(number of shards decided at queue creation time) | |
16:19 <benblack> yeah, so, if you could never say "shard" again in the context of a | |
dynamo system that'd be awesome | |
16:20 <grantr> isnt that sort of how vnodes work though? segment of | |
the keyspace == shard | |
16:20 <benblack> no | |
16:20 <benblack> it is really not | |
16:20 <benblack> those are partitions | |
16:20 <benblack> and they are integral to the design | |
16:20 <benblack> not bolted on later | |
16:20 <grantr> integral to the design of the vnode implementation? | |
16:20 <benblack> integral to the design of dynamo | |
16:21 <benblack> as in, it is even lower level than the storage system | |
16:21 <grantr> hmmm | |
16:21 <benblack> that's how riak_core can exist | |
16:21 <grantr> i should read the paper again | |
16:21 <benblack> you can shard mysql, right? | |
16:22 <grantr> i guess i had this idea that's it's basically consistent hashing | |
like with memcached, but managed by the cluster instead of the client | |
16:22 <grantr> right | |
16:22 <benblack> ok, or memcache consistent hashing. what is the equivalent of | |
riak_core in those sharded setups? | |
16:22 <grantr> gizzard i suppose. some proxy that sits in between the client and | |
the nodes, and decides which node the request goes to | |
16:23 <benblack> correct, the answer is "there is no equivalent" | |
16:23 <benblack> you bolt some stuff on and hope for the best | |
16:24 <grantr> what is riak core giving you that's more than a consistent | |
hashing proxy then? | |
16:24 <grantr> hinted handoff? | |
16:24 <benblack> it's not a proxy | |
16:24 <benblack> group membership, partitioning, hh, yes | |
16:24 <benblack> online reconfiguration and redistribution | |
16:24 <benblack> and, again, it isn't a proxy | |
16:24 <grantr> shared state | |
16:25 <benblack> the nodes are first class actors in the system | |
16:25 <benblack> aware of each other | |
16:25 <benblack> shards are generally not aware of each other | |
16:26 <benblack> the mongo replica sets stuff is similarly bolted on. very limited | |
set of nodes in a set, weird constraints because of their simplistic leader election, | |
and, again, sets are not aware of other sets. | |
16:27 <grantr> i see | |
16:28 <benblack> witness the 4sq outage report | |
16:28 <benblack> the glory of autosharding in full flower | |
16:28 <grantr> i didnt even hear about that, looking | |
16:29 <benblack> http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/ | |
16:33 <grantr> oh interesting, they're using mongo | |
16:35 <grantr> so how could that have been avoided with a dynamo system? seems | |
like a case of unbalanced load, which can happen in dynamo. i guess dynamo systems | |
would be easier to rebalance? | |
16:36 <benblack> you add more nodes, partitions are moved | |
16:36 <pharkmillups> grantr: i can only speak for Riak, but the.. | |
16:36 <benblack> there's no special case | |
16:36 <pharkmillups> yeah, what benblack said | |
16:37 <benblack> pharkmillups has returned, i leave you in his capable hands | |
16:37 <grantr> in riak thats true, in cassandra there is a manual step. but | |
essentially the same thing. and you get that because rebalancing is built into the system | |
16:37 <grantr> thanks benblack | |
16:38 <grantr> i always learn something in this channel! | |
16:39 <grantr> i should reread the dynamo paper with queues in mind | |
16:39 <grantr> maybe there is a difficulty with partitioning that i'm missing | |
16:41 <benblack> it is the eventual consistency that bites you on this one, generally, | |
rather than partitioning. | |
16:42 <grantr> got it | |
16:42 <grantr> two pops at the same time to different nodes could get the same item | |
16:44 <grantr> and i guess that's a result of all the reads and writes (deletes) | |
]being in the same "place" | |
16:45 <grantr> that's how it's different from the typical usage pattern of a kv store | |
16:46 <grantr> hibari's chain replication could be a better system than dynamo | |
then for replicated queues | |
16:46 <grantr> anything with strong consistency |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment