Skip to content

Instantly share code, notes, and snippets.

@seancribbs
Created December 7, 2010 17:10
Show Gist options
  • Save seancribbs/732074 to your computer and use it in GitHub Desktop.
Save seancribbs/732074 to your computer and use it in GitHub Desktop.
# Riak KV 0.14 will add key-filters to MapReduce queries. riak-client needs
# a nice and efficient syntax for this feature, please leave a comment with
# the format that you like best.
# For more info on how key-filters work see:
# http://www.slideshare.net/hemulen/riak-mapred-preso
# Preliminaries so you know what we're talking about
client = Riak::Client.new
mr = Riak::MapReduce.new(client)
# Option 1: method_missing hack to support key-filters by name
# Pros: simple to add new filters as they come available
# Cons: leaky abstraction, doesn't enforce need to be entire-bucket query
mr.add("bucket").tokenize("-", 3).string_to_int.between(2009,2010)
# Option 2: simple filter method
# Pros: feels more like phase additions
# Cons: doesn't enforce need to be entire-bucket query
mr.add("bucket").filter(:tokenize, "-", 3).filter(:string_to_int).filter(:between, 2009, 2010)
# Option 3: DSL-ish block syntax
# Pros: enforces entire-bucket query, encapsulation of filter sequence
# Cons: verbose, instance_eval can be ugly/problematic (instance_eval optional)
mr.filter("bucket") do
tokenize "-", 3
string_to_int
between 2009, 2010
end
# Option 4: dumb pass-through
# Pros: simplest to implement
# Cons: hard to verify or constrain the format of inputs
mr.add("bucket", [[:tokenize, "-", 3],[:string_to_int],[:between, 2009, 2010]])
@seancribbs
Copy link
Author

@nfo: I was told that you can only use one bucket, and you either have a bucket with filters, or the previously available options (including the riak_search invocation). But I'm sure Kevin would appreciate the other ideas for his future improvements.

@seancribbs
Copy link
Author

Implementation is started on a branch: https://github.com/seancribbs/ripple/compare/master...key-filters

@bbhoss: define_method prevented a number of stack level too deep errors I got, and feels cleaner (although I might switch to class_eval since it's faster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment