Skip to content

Instantly share code, notes, and snippets.

@jtuple
Created August 30, 2012 21:24
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jtuple/3541459 to your computer and use it in GitHub Desktop.
Save jtuple/3541459 to your computer and use it in GitHub Desktop.

A Glimpse Into The Future of Riak at RICON 2012

On October 10th, the two-day RICON 2012 conference kicks off with a focus on distributed systems. Yes, there will be a large focus on Riak. The talk schedule makes that clear. However, the focus on the conference is really about distributed systems and getting smart people to come together and hang out for a couple days. Like many conferences, the most memorable element is going to be the people you meet and the side conversations over coffee and beer.

With that said, the Riak content in this conference is going to be excellent. In my opinion, there are three talks concerning on-going research and development in Riak that really shouldn't be missed: Sean and Russell's talk on adding convergent data structures to Riak, my talk about adding true strong consistency to Riak, and Ryan's talk about Riak/Solr integration.

Data Structures in Riak. In this talk, Sean and Russell will talk about adding convergent data structures to Riak. Think: distributed counters, sets, maps, etc. With Riak handling all sibling resolution. The key here is that this work is rooted in academic theory, and provides strong guarantees. It looks like Cassandra may end up being the first Dynamo-inspired NoSQL database to offer complex data structures with their upcoming collections feature, but it's unclear to me if this work is based on convergent data types or just an ad-hoc solution.

The important questions when talking about complex data types are the following. Can it tolerate concurrent requests, node failures, and network partitions? What about network partitions where clients can talk to each side of the partition, but object replicas cannot communicate? What happens if you delete and add the same element concurrently on both sides of a partition? Does the system use wall-clock time for resolution? If so, can the system handle incorrect / not-in-sync clocks without clobbering your collections? The work that Sean and Russell are working can handle the above scenarios. Come to this talk to learn how.

Bringing Consistency to Riak. My talk will focus on adding a form of true strong consistency to Riak. There's no "beating the CAP theorem" here. You can't have requests that provide consistency (C), availability (A), and partition-tolerance (P) at once; but you can have a system that provides both AP and CP semantics on a per-request or per-object basis. The initial prototype work is focused on per-object basis: objects in certain buckets have CP semantics, the rest maintain normal Riak AP semantics.

The focus here is to provide per-object timeline consistency, much like Yahoo's PNUTS database. In fact, prior to joining Basho, I ended up implementing something very much like PNUTS. I wrote riak_zab, a port of the Zookeeper atomic broadcast protocol to Erlang/riak_core, and wrote riakual on-top of it -- an access layer to Riak objects with CP semantics. The Riak keyspace was broken up to into multiple ensembles based on the nodes that owned different replicas within the keyspace, and each ensemble elected a leader responsible for serializing accesses to a given object and relied upon the atomic broadcast layer to make everything work. Unfortunately, this approach, much like the use of Yahoo's Message Broker in PNUTS, implies a two-phase commit for every access. We can do better.

The new approach is more closely integrated with Riak itself. It still breaks the ring into multiple ensembles, and still has an elected leader per ensemble that coordinates accesses to a given set of replicas, but everything is written optimistically without two-phase commit. If a write ever fails to be acknowledged by a quorum of replicas, we handle resolving/repairing object consistency on the next read -- much like read-repair in classic Riak/Dynamo. Head over to my talk at RICON to find out how, and learn all the ins and outs of this work.

Introducing Yokozuna - Searching Riak With Solr. Ryan's talk will focus on new development on tightly integrating Riak with Solr. Riak already includes a built-in full-text search system called Riak Search. However, Riak Search isn't Solr. Come to Ryan's talk to learn why he's interested in integrating Riak with Solr, how things are implemented, and more. Also, Yokozuna is not the first time someone has tried to integrate Riak with Solr, both internally and externally to Basho. Take a look at Mecha for an alternate approach from @jrecursive.

RICON also has keynotes scheduled from Professors Eric Brewer and Joseph Hellerstein of UC Berkeley and Dr. Gary Flake of Clipboard.com, as well as several other talks. Other talks that I'm personally interested in are the Riak CS talk by Reid and Kelly, the Boundary talk using network-analytics derived "radiological imagery" by Dietrich Featherston, and the scaling Twitter talk by Dana Contreras.

See also:
Original blog announcement of RICON 2012 (here).
List of talks by programming languages (here).

-Joe
@jtuple

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment