Created
October 29, 2010 23:35
-
-
Save PharkMillups/654659 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14:39 <jlouis> Question time... A bucket has 868497 keys in it. Is there a way to run a mapred query over | |
the whole bucket without calling list_keys(Bucket) first to generate a list of size 868497 and then consume | |
it straight after? | |
14:42 <ericflo> jlouis: you can just pass a bucket name to mapreduce | |
14:44 <ericflo> jlouis: but in my understanding it does do a list_keys anyway | |
14:45 <jlouis> ah, the erlang riakc_pb_socket interface has a mapred_bucket function, I'll go with that for now | |
14:45 <hemulen> it does do a list_keys but we stream the keys internally so as to not load the entire | |
key list into memory, fwiw | |
14:45 <hemulen> that's what caused the huge memory footprint in prior releases | |
14:45 <jlouis> noted. | |
14:46 justinsheehy joined | |
14:47 <jlouis> hemulen: my goal mostly was to push the burden of the key maintenance down into | |
riak such that exactly that case could be fixed if not already, then later | |
14:48 <hemulen> yep. full-bucket mapred is something we didn't support as well before but there | |
are improvements in later releases and we've more improvements on the way. | |
14:49 <hemulen> long-winded way of agreeing with you :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment