Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created October 6, 2010 17:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/613732 to your computer and use it in GitHub Desktop.
Save PharkMillups/613732 to your computer and use it in GitHub Desktop.
16:08 <grantr> hey guys me again
16:08 <grantr> i've been thinking about how to implement a queue in riak
(or any similar kv store)
16:08 <grantr> any advice on that?
16:09 <jdmaturen> afaict there are no great ways to concurrently read/write a
given key [queue]. /me waves hands at all the MQs out there
16:09 <grantr> trying to figure out how to structure it so an add or remove can be a single write
16:10 <benblack> with an unmodified store, you can't.
16:10 <grantr> ah
16:11 <benblack> you could do something where you just wrote stuff in, then
fixed it up on read
16:12 <benblack> but it would not have very consistent performance
16:13 <grantr> you'd have to do a lot of list_keys or mapred
16:13 <benblack> no
16:13 <benblack> you'd allow_multi, retrieve all siblings on read, and write
back an ordered set of linked documents
16:13 <benblack> or similar
16:14 <grantr> oh i never thought of that
16:14 <benblack> you'd be better off writing something for the purpose on
riak_core, not on _kv
16:14 <grantr> yeah i had a crazy idea
16:14 <grantr> seems like it would be possible (though maybe not prudent) to wrap
rabbit's queue implementation in riak core
16:15 <grantr> you would probably lose strict ordering
16:15 <benblack> been discussed.
16:16 <grantr> any conclusions?
16:16 <benblack> yes, it's hard.
16:16 <grantr> damn ;)
16:16 <grantr> hard to maintain queue semantics?
16:18 <grantr> on the face of it, it seems relatively easy - routing table maintained by the
ring, each vnode is assigned a queue or some shard of a queue
(number of shards decided at queue creation time)
16:19 <benblack> yeah, so, if you could never say "shard" again in the context of a
dynamo system that'd be awesome
16:20 <grantr> isnt that sort of how vnodes work though? segment of
the keyspace == shard
16:20 <benblack> no
16:20 <benblack> it is really not
16:20 <benblack> those are partitions
16:20 <benblack> and they are integral to the design
16:20 <benblack> not bolted on later
16:20 <grantr> integral to the design of the vnode implementation?
16:20 <benblack> integral to the design of dynamo
16:21 <benblack> as in, it is even lower level than the storage system
16:21 <grantr> hmmm
16:21 <benblack> that's how riak_core can exist
16:21 <grantr> i should read the paper again
16:21 <benblack> you can shard mysql, right?
16:22 <grantr> i guess i had this idea that's it's basically consistent hashing
like with memcached, but managed by the cluster instead of the client
16:22 <grantr> right
16:22 <benblack> ok, or memcache consistent hashing. what is the equivalent of
riak_core in those sharded setups?
16:22 <grantr> gizzard i suppose. some proxy that sits in between the client and
the nodes, and decides which node the request goes to
16:23 <benblack> correct, the answer is "there is no equivalent"
16:23 <benblack> you bolt some stuff on and hope for the best
16:24 <grantr> what is riak core giving you that's more than a consistent
hashing proxy then?
16:24 <grantr> hinted handoff?
16:24 <benblack> it's not a proxy
16:24 <benblack> group membership, partitioning, hh, yes
16:24 <benblack> online reconfiguration and redistribution
16:24 <benblack> and, again, it isn't a proxy
16:24 <grantr> shared state
16:25 <benblack> the nodes are first class actors in the system
16:25 <benblack> aware of each other
16:25 <benblack> shards are generally not aware of each other
16:26 <benblack> the mongo replica sets stuff is similarly bolted on. very limited
set of nodes in a set, weird constraints because of their simplistic leader election,
and, again, sets are not aware of other sets.
16:27 <grantr> i see
16:28 <benblack> witness the 4sq outage report
16:28 <benblack> the glory of autosharding in full flower
16:28 <grantr> i didnt even hear about that, looking
16:29 <benblack> http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/
16:33 <grantr> oh interesting, they're using mongo
16:35 <grantr> so how could that have been avoided with a dynamo system? seems
like a case of unbalanced load, which can happen in dynamo. i guess dynamo systems
would be easier to rebalance?
16:36 <benblack> you add more nodes, partitions are moved
16:36 <pharkmillups> grantr: i can only speak for Riak, but the..
16:36 <benblack> there's no special case
16:36 <pharkmillups> yeah, what benblack said
16:37 <benblack> pharkmillups has returned, i leave you in his capable hands
16:37 <grantr> in riak thats true, in cassandra there is a manual step. but
essentially the same thing. and you get that because rebalancing is built into the system
16:37 <grantr> thanks benblack
16:38 <grantr> i always learn something in this channel!
16:39 <grantr> i should reread the dynamo paper with queues in mind
16:39 <grantr> maybe there is a difficulty with partitioning that i'm missing
16:41 <benblack> it is the eventual consistency that bites you on this one, generally,
rather than partitioning.
16:42 <grantr> got it
16:42 <grantr> two pops at the same time to different nodes could get the same item
16:44 <grantr> and i guess that's a result of all the reads and writes (deletes)
]being in the same "place"
16:45 <grantr> that's how it's different from the typical usage pattern of a kv store
16:46 <grantr> hibari's chain replication could be a better system than dynamo
then for replicated queues
16:46 <grantr> anything with strong consistency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment