Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created October 29, 2010 23:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/654659 to your computer and use it in GitHub Desktop.
Save PharkMillups/654659 to your computer and use it in GitHub Desktop.
14:39 <jlouis> Question time... A bucket has 868497 keys in it. Is there a way to run a mapred query over
the whole bucket without calling list_keys(Bucket) first to generate a list of size 868497 and then consume
it straight after?
14:42 <ericflo> jlouis: you can just pass a bucket name to mapreduce
14:44 <ericflo> jlouis: but in my understanding it does do a list_keys anyway
14:45 <jlouis> ah, the erlang riakc_pb_socket interface has a mapred_bucket function, I'll go with that for now
14:45 <hemulen> it does do a list_keys but we stream the keys internally so as to not load the entire
key list into memory, fwiw
14:45 <hemulen> that's what caused the huge memory footprint in prior releases
14:45 <jlouis> noted.
14:46 justinsheehy joined
14:47 <jlouis> hemulen: my goal mostly was to push the burden of the key maintenance down into
riak such that exactly that case could be fixed if not already, then later
14:48 <hemulen> yep. full-bucket mapred is something we didn't support as well before but there
are improvements in later releases and we've more improvements on the way.
14:49 <hemulen> long-winded way of agreeing with you :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment