Created
August 2, 2010 18:17
-
-
Save PharkMillups/505063 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12:01 <NemesisD> hi all. im having some problems grasping reduce phases. i need | |
to basically partition the dataset into 2 groups in the map phase and then sum up | |
the counts of the two groups as 2 separate numbers in the reduce. would storing the | |
result in a hash like {group1:10, group2:20} not conform to the associativity, | |
commutability and imdepotency rules? | |
12:07 <bingeldac> it would | |
12:08 <NemesisD> hmm | |
12:09 <NemesisD> does riak use some special subset/javascript extension? in the docs | |
i see functions like reduce() and forEach | |
12:09 <NemesisD> and object access to hashes | |
12:09 <bingeldac> reduce is valid on an array | |
12:11 <NemesisD> reduce would not be valid on a hash? (not as input but as a result i guess?) | |
12:12 <bingeldac> same thing | |
12:12 <bingeldac> you can, I will show you some code | |
12:12 <bingeldac> so the return of your map is that hash? | |
12:13 <NemesisD> bingeldac: yeah. | |
12:13 <NemesisD> i see from the docs that even if its a hash you wrap it in an array, | |
but what confuses me is that in subsequent calls to reduce its simply creating another | |
array so i don't know what happens to that first one | |
12:14 <bingeldac> http://gist.github.com/503601 | |
12:14 <bingeldac> something like that | |
12:15 <bingeldac> just cut up some other code, but I think that is more or less what | |
you are looking to do | |
12:17 <benblack> NemesisD: what do you mean what happens to that first one? the first | |
array is the input to reduce, the second is the output from reduce. | |
12:19 <NemesisD> benblack: so the example on the wiki, it's doing var r = {} for each reduce call. | |
it would seem like this gets done every time the reduce is called so it isn't clear | |
when these separate hashes would be combined | |
12:20 <NemesisD> upon rereading it i guess it makes some more sense to me actually | |
12:21 <benblack> can you give me the link to the page you mean? | |
12:21 <benblack> i'm sure if there is a way to make it more clear the basho peeps | |
would want to do that | |
12:21 <bingeldac> yeah | |
12:21 <bingeldac> already been discussed and we are :) | |
12:22 <benblack> always one step ahead | |
12:22 <bingeldac> I assume he means https://wiki.basho.com/display/RIAK/MapReduce | |
12:22 <NemesisD> yeah, the word count one | |
12:22 <bingeldac> NemesisD: did you watch the M/R webinar? | |
12:22 <bingeldac> also I think sean's yakriak app has some good m/r examples | |
12:23 <NemesisD> bingeldac: i don't think i watched a webinar, just the short video | |
introducing it (i think kevin smith(?) did it) | |
12:23 <bingeldac> but we could do better to have a more end to end description | |
and code thing | |
12:23 <bingeldac> http://vimeo.com/13554436 | |
12:23 <benblack> *heart* yakriak ( ---> http://github.com/seancribbs/yakriak) | |
12:23 <bingeldac> yeah | |
12:23 <bingeldac> it is neato | |
12:23 <benblack> first interesting m/r could i grokked instantly. | |
12:23 <benblack> s/could/code/ | |
12:24 <bingeldac> we have taken that model too | |
12:24 <benblack> which reminds me i still haven't posted my slides on it | |
12:24 <bingeldac> and are working on a m/r query tool | |
12:24 <bingeldac> all served via riak like yakriak | |
12:24 <benblack> nice! | |
12:24 <bingeldac> but those customer obligations keep getting in our way | |
12:24 <NemesisD> bingeldac: also where would i go to get a full list of Riak's | |
functions (for example, i had no idea about filterNotFound) | |
12:24 <bingeldac> NemesisD: well | |
12:24 <bingeldac> there is a link | |
12:24 <bingeldac> oh crap it is hosed | |
12:25 <bingeldac> it references it from the m/r page | |
12:25 <NemesisD> hosery! | |
12:25 <benblack> NemesisD: filterNotFound() removes empty elements from the array | |
12:25 <benblack> so your function only processes elements with data | |
12:25 <NemesisD> ah. yeah thought it would do something along those lines | |
12:26 <bingeldac> http://hg.basho.com/riak_kv/src/tip/priv/mapred_builtins.js | |
12:27 <bingeldac> there were some nice changes to m/r this weekend.. | |
12:27 <bingeldac> hopefully they will get out this week | |
12:27 <NemesisD> this is more of a js newb question but is the values2 | |
necessary in your example, bingeldac? wouldn't you be able to just | |
do values = Riak.filterNotFound(values); ? | |
12:28 <benblack> that's destructive | |
12:28 <benblack> you don't want m/r phases to modify their inputs, you want | |
them to produce new output | |
12:29 <NemesisD> oh ok | |
12:29 <benblack> write some erlang, then you'll write js like that automatically ;) | |
12:30 <bingeldac> haha | |
12:33 <NemesisD> benblack: lol im trying to learn haskell right now actually. | |
i know theyre very different but its | |
still my first step into functional programming | |
12:34 <benblack> immutable data is addictive | |
12:41 <NemesisD> ok this part kind of surprised me. using whole buckets | |
in a map reduce is discouraged except in development. how else would you | |
get the list of bucket/key pairs to use for any sort of meaningful aggregation? | |
12:42 <benblack> NemesisD: one way is by storing lists of keys in index documents | |
12:43 <benblack> another is by link walking | |
12:49 <NemesisD> benblack: so store a big ass json array in an index doc and | |
then fetch that, parse it and feed it as a key list to my langs riak driver? | |
and this is faster than the native key list? | |
12:50 <benblack> link walking would be the preferred technique, but, yes | |
12:52 <NemesisD> i have not looked into links too much. i was under the impression they were | |
to be considered analogous to foreign keys, except my application only has one conceptual | |
model/bucket as it were | |
12:52 <NemesisD> and none of the records are related to eachother in any meaningful way | |
13:43 <NemesisD> bingeldac: just had a question on your sample code | |
http://gist.github.com/503601 where does the initial value of the accumulator get defined? | |
13:48 <NemesisD> oh disregard, didn't realize reduce was a feature of the language. | |
my understanding of js is so outdated | |
14:12 <seancribbs> NemesisD: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference is indispensible |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment