Skip to content

Instantly share code, notes, and snippets.

Created May 17, 2012 20:08
Show Gist options
  • Save reiddraper/2721295 to your computer and use it in GitHub Desktop.
Save reiddraper/2721295 to your computer and use it in GitHub Desktop.

Riak Users,

I've been working on some new features for Riak's secondary indexes (2i), and I wanted to get some feedback from the community before proceeding much further.

How does 2i work currently?

2i lets you query index values with an equality query, or with an inclusive range query. The results are not sorted in any way.

Basic proposed changes

  1. Results will now be sorted by [IndexValue, PrimaryKey] in ascending-order. I'm unsure if supporting descending-order queries is important-enough to do now, or add in later.

  2. Queries can specify a limit for the number of results to return. As expected, the limit is applied after sorting

  3. Queries can constrain the results returned based on the index-value being eq (equal), lt (less than), lte (less-than-or-equal), gt or gte. Queries can also provide no constraints, which will return the whole index (before limiting that is).

  4. Queries can "continue" where a previous query left-off. This provides a limited form of pagination. Note, there is no snapshotting or isolation between queries using a continuation. The continuations have no server-side state, and last forever.


Queries are described as JSON data-structures. The query is for a specific Bucket/Index, which is specified either in the path (for HTTP) or in the PB message.

// return the first 100 keys 
// whose index value is
// greater-than-or-equal to "bar" and
// less-than "foo"
    "limit": 100, 
    "query": {
        "gte": "bar", 
        "lt": "foo"

// this query will return a continuation that will
// let you ask for more results, starting at
// the 101st value
// return all the keys whose index
// value is greater-than 50
    "query": {
        "gt": 50, 
// return all of the keys whose
// index value is equal to 33
    "query": {
        "eq": 33
// ask for 100 more results using a continuation
// from a previous query. Note, requests with
// continuations should _not_ include a query
// section, as that is wrapped up in the continuation
// already
    "continuation": "a85hYGBgymBKYWBKLcwFshklQRyWnMTiErAUR", 
    "limit": 100


  1. Will it help you do things with 2i that you wanted to do but previously weren't able?

  2. How important is it to be able to specify the sort oder (ASC vs. DESC)?

  3. Any other thoughts are appreciated as well.

Copy link

ghost commented May 18, 2012

Rather like jeraymond I would find being able to query on more than one index more useful than sorting - as the hacks to get around multiple index queries are more unpleasant than applying a sort myself - not that being able to sort Riak-side is unwelcome!

Copy link

And how about result limits? Since all data is located on different vnodes you cant effectively stop searching next results when you are done.

@ddosia The coordinating FSM simply stops the vnode folds when enough results have been accumulated.

@jeraymond, @Bufferine I agree multi-index queries would be useful. They're unfortunately outside the scope of this project, but rest-assured we're thinking about them for the future. In the meantime, Riak Search might fit your use-case.

Copy link

ghost commented May 18, 2012

Ah well can't have everything all at once :)

Yep, using those, as well as some creative secondary indexing where it's appropriate.

Copy link

armon commented May 18, 2012

It would be incredibly nice to have an option of "materializing" the results and getting the actual objects back instead of just the keys. This would allow us to avoid an additional M/R step.

Copy link

@armon Yes, that's very much on our radar as well :)

Copy link

I am late to this party, but descending sorting would be rather useful. Many online applications make queries that ask for the latest N items. For that to work efficiently with a limit you need a descending sort order. There are ways to get around it. You can issue multiple bracketed queries, but that is inefficient. Or you can rewrite your index value so that your lower values become your highest and vice versa. But it would be great to have native support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment