Skip to content

Instantly share code, notes, and snippets.

@PharkMillups
Created August 2, 2010 18:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save PharkMillups/505063 to your computer and use it in GitHub Desktop.
Save PharkMillups/505063 to your computer and use it in GitHub Desktop.
12:01 <NemesisD> hi all. im having some problems grasping reduce phases. i need
to basically partition the dataset into 2 groups in the map phase and then sum up
the counts of the two groups as 2 separate numbers in the reduce. would storing the
result in a hash like {group1:10, group2:20} not conform to the associativity,
commutability and imdepotency rules?
12:07 <bingeldac> it would
12:08 <NemesisD> hmm
12:09 <NemesisD> does riak use some special subset/javascript extension? in the docs
i see functions like reduce() and forEach
12:09 <NemesisD> and object access to hashes
12:09 <bingeldac> reduce is valid on an array
12:11 <NemesisD> reduce would not be valid on a hash? (not as input but as a result i guess?)
12:12 <bingeldac> same thing
12:12 <bingeldac> you can, I will show you some code
12:12 <bingeldac> so the return of your map is that hash?
12:13 <NemesisD> bingeldac: yeah.
12:13 <NemesisD> i see from the docs that even if its a hash you wrap it in an array,
but what confuses me is that in subsequent calls to reduce its simply creating another
array so i don't know what happens to that first one
12:14 <bingeldac> http://gist.github.com/503601
12:14 <bingeldac> something like that
12:15 <bingeldac> just cut up some other code, but I think that is more or less what
you are looking to do
12:17 <benblack> NemesisD: what do you mean what happens to that first one? the first
array is the input to reduce, the second is the output from reduce.
12:19 <NemesisD> benblack: so the example on the wiki, it's doing var r = {} for each reduce call.
it would seem like this gets done every time the reduce is called so it isn't clear
when these separate hashes would be combined
12:20 <NemesisD> upon rereading it i guess it makes some more sense to me actually
12:21 <benblack> can you give me the link to the page you mean?
12:21 <benblack> i'm sure if there is a way to make it more clear the basho peeps
would want to do that
12:21 <bingeldac> yeah
12:21 <bingeldac> already been discussed and we are :)
12:22 <benblack> always one step ahead
12:22 <bingeldac> I assume he means https://wiki.basho.com/display/RIAK/MapReduce
12:22 <NemesisD> yeah, the word count one
12:22 <bingeldac> NemesisD: did you watch the M/R webinar?
12:22 <bingeldac> also I think sean's yakriak app has some good m/r examples
12:23 <NemesisD> bingeldac: i don't think i watched a webinar, just the short video
introducing it (i think kevin smith(?) did it)
12:23 <bingeldac> but we could do better to have a more end to end description
and code thing
12:23 <bingeldac> http://vimeo.com/13554436
12:23 <benblack> *heart* yakriak ( ---> http://github.com/seancribbs/yakriak)
12:23 <bingeldac> yeah
12:23 <bingeldac> it is neato
12:23 <benblack> first interesting m/r could i grokked instantly.
12:23 <benblack> s/could/code/
12:24 <bingeldac> we have taken that model too
12:24 <benblack> which reminds me i still haven't posted my slides on it
12:24 <bingeldac> and are working on a m/r query tool
12:24 <bingeldac> all served via riak like yakriak
12:24 <benblack> nice!
12:24 <bingeldac> but those customer obligations keep getting in our way
12:24 <NemesisD> bingeldac: also where would i go to get a full list of Riak's
functions (for example, i had no idea about filterNotFound)
12:24 <bingeldac> NemesisD: well
12:24 <bingeldac> there is a link
12:24 <bingeldac> oh crap it is hosed
12:25 <bingeldac> it references it from the m/r page
12:25 <NemesisD> hosery!
12:25 <benblack> NemesisD: filterNotFound() removes empty elements from the array
12:25 <benblack> so your function only processes elements with data
12:25 <NemesisD> ah. yeah thought it would do something along those lines
12:26 <bingeldac> http://hg.basho.com/riak_kv/src/tip/priv/mapred_builtins.js
12:27 <bingeldac> there were some nice changes to m/r this weekend..
12:27 <bingeldac> hopefully they will get out this week
12:27 <NemesisD> this is more of a js newb question but is the values2
necessary in your example, bingeldac? wouldn't you be able to just
do values = Riak.filterNotFound(values); ?
12:28 <benblack> that's destructive
12:28 <benblack> you don't want m/r phases to modify their inputs, you want
them to produce new output
12:29 <NemesisD> oh ok
12:29 <benblack> write some erlang, then you'll write js like that automatically ;)
12:30 <bingeldac> haha
12:33 <NemesisD> benblack: lol im trying to learn haskell right now actually.
i know theyre very different but its
still my first step into functional programming
12:34 <benblack> immutable data is addictive
12:41 <NemesisD> ok this part kind of surprised me. using whole buckets
in a map reduce is discouraged except in development. how else would you
get the list of bucket/key pairs to use for any sort of meaningful aggregation?
12:42 <benblack> NemesisD: one way is by storing lists of keys in index documents
12:43 <benblack> another is by link walking
12:49 <NemesisD> benblack: so store a big ass json array in an index doc and
then fetch that, parse it and feed it as a key list to my langs riak driver?
and this is faster than the native key list?
12:50 <benblack> link walking would be the preferred technique, but, yes
12:52 <NemesisD> i have not looked into links too much. i was under the impression they were
to be considered analogous to foreign keys, except my application only has one conceptual
model/bucket as it were
12:52 <NemesisD> and none of the records are related to eachother in any meaningful way
13:43 <NemesisD> bingeldac: just had a question on your sample code
http://gist.github.com/503601 where does the initial value of the accumulator get defined?
13:48 <NemesisD> oh disregard, didn't realize reduce was a feature of the language.
my understanding of js is so outdated
14:12 <seancribbs> NemesisD: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference is indispensible
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment