Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
14:39 <jlouis> Question time... A bucket has 868497 keys in it. Is there a way to run a mapred query over
the whole bucket without calling list_keys(Bucket) first to generate a list of size 868497 and then consume
it straight after?
14:42 <ericflo> jlouis: you can just pass a bucket name to mapreduce
14:44 <ericflo> jlouis: but in my understanding it does do a list_keys anyway
14:45 <jlouis> ah, the erlang riakc_pb_socket interface has a mapred_bucket function, I'll go with that for now
14:45 <hemulen> it does do a list_keys but we stream the keys internally so as to not load the entire
key list into memory, fwiw
14:45 <hemulen> that's what caused the huge memory footprint in prior releases
14:45 <jlouis> noted.
14:46 justinsheehy joined
14:47 <jlouis> hemulen: my goal mostly was to push the burden of the key maintenance down into
riak such that exactly that case could be fixed if not already, then later
14:48 <hemulen> yep. full-bucket mapred is something we didn't support as well before but there
are improvements in later releases and we've more improvements on the way.
14:49 <hemulen> long-winded way of agreeing with you :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.