Skip to content

Instantly share code, notes, and snippets.

@AnthonyNystrom
Created May 1, 2013 16:36
Show Gist options
  • Save AnthonyNystrom/5496413 to your computer and use it in GitHub Desktop.
Save AnthonyNystrom/5496413 to your computer and use it in GitHub Desktop.

#Map Reduce is Simple (in Javascript)

##The Implementation

var mf = function (item){return item;};//Sample MAP FUNCTION -- this one does nothings to the initial values of the DATA array 
    //For more information, check out https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/map
 
var rf = function (iv,current) {iv.push(current);return iv;};//Sample REDUCE FUNCTION -- this one simply adds mapped values to the INITIAL VALUE array
    //For more information, check out https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/reduce
 
var iv = [];//Sample INITIAL VALUE for reduction step -- values must be "folded" into this using the REDUCE FUNCTION
 
var d = [];//Sample DATA to be processed
 
console.log(d
    .map(mf)//Map Step using MAP FUNCTION
    .reduce(rf,iv));//Reduce Step using REDUCE FUNCTION and INITIAL VALUE

##The Idea

The idea behind MAP REDUCE is that you have a data set (as an array) in which each member of the set is processed using a MAP FUNCTION. The results are then be combined using a REDUCE FUNCTION.

The "trick" lies in the fact that values produced by the REDUCE FUNCTION can be combined progressively, that is, data can start off on several different machines where the mapping (and possibly some of the reducing) is done before it's reduced at a central location into it's final form. I haven't read all the way through Google's famous paper on the subject (http://research.google.com/archive/mapreduce.html) but I imagine that it revolves around the fact that Google can use this "trick" to efficiently process their data as many of their data sets span several machines. I think there's some extra on keys using keys to sort data, but there's a few ways you can handle that. I also imagine that there might be some interesting stuff on NoSQL databases -- I really should read that thing...

BONUS KNOWLEDGE: If you ensure that the REDUCE FUNCTION is an ASSOCIATIVE FUNCTION, then the order in which data is combined will have no effect on the final outcome of the data. Associative means f(a,b) === f(b,a)

Keep in mind that Map Reduce is an absurdly simple concept. That's it.

##Examples ###Just a few to get you started Please note that in many examples, I create arrays of string splitting strings rather than creating arrays explicityly.

//Example: Calculate 15% tip on a list of bills and order them
    //No keys used
    //Reduce simply creates a sorted array
var mapper = function (item){return Math.round(parseFloat(item)*1.15*100)/100}; 
var reducer = function (intermediate, current) { intermediate.push(current); return intermediate.sort();};
var initial = [];
var data = "14.22 15.28 19.42 16.80 11.96".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Zip Arrays
    //No keys used
    //Reduce simply creates a sorted array
var mapper = function (item){return item;}
var reducer = function (intermediate, current) {
    for(var i = 0; i<current.length; i++){
        intermediate[i] = intermediate[i] || [];
        intermediate[i].push(current[i]);
    }
    return intermediate;
}
var initial = [];
var data = [[1,2,3],[4,5,6],[7,8,9]]
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Get count of each word in a sentence
    //Keys are generated dynamically from data
    //Inspired by http://java.dzone.com/articles/javascript-mapreduce-one-liner
var mapper = function (item){return item.toLowerCase()};
var reducer = function (intermediate, current) { if (current) intermediate[current] = (intermediate[current] || 0) + 1; return intermediate;};
var initial = {};
var data = "Did that little red hen just run over that larger, redder black hen over there?".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Count the number of characeters in each word in a sentence and group them by parity
    //Possible keys know before hand and are incorporated into the initial value
        //Doing this can simply the reduce function
var mapper = function (item){
    return {
        parity : (item.length % 2)? "odd" : "even",
        len : item.length
    }
};
var reducer = function (intermediate, current) {
    if (current)
        intermediate[current.parity].push(current);
    return intermediate;
};
var initial = {"even":[],"odd":[]};
var data = "1 111 11 1111 11 111 1111 111 11 111 11 111111 11 1 11 11 1111".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Count the number of characeters in each word in a sentence and group them by parity
    //Possible keys not know before hand
var mapper = function (item){
    return {
        parity : item.length % 2,
        len : item.length
    }
};
var reducer = function (intermediate, current) {
    if (current)
        intermediate[current.parity] = intermediate[current.parity]? intermediate[current.parity].concat(current.len) : [current.len];
    return intermediate;
};
var initial = {};
var data = "1 111 11 1111 11 111 1111 111 11 111 11 111111 11 1 11 11 1111".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));

##License -- (MIT License) Copyright (c) 2013 John Henry <john@iamjohnhenry.com>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

//
////
////////
//Map Reduce is Simple (in Javascript)
////////
////
//
var mf = function (item){return item;};//Sample MAP FUNCTION -- this one does nothings to the initial values of the DATA array
//For more information, check out https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/map
var rf = function (iv,current) {iv.push(current);return iv;};//Sample REDUCE FUNCTION -- this one simply adds mapped values to the INITIAL VALUE array
//For more information, check out https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array/reduce
var iv = [];//Sample INITIAL VALUE for reduction step -- values must be "folded" into this using the REDUCE FUNCTION
var d = [];//Sample DATA to be processed
console.log(d
.map(mf)//Map Step using MAP FUNCTION
.reduce(rf,iv));//Reduce Step using REDUCE FUNCTION and INITIAL VALUE
//The idea behind MAP REDUCE is that you have a data set (as an array) in which each member of the set is processed using a MAP FUNCTION.
//The results are then be combined using a REDUCE FUNCTION.
//The "trick" lies in the fact that values produced by the REDUCE FUNCTION can be combined progressively, that is,
//data can start off on several different machines where the mapping (and possibly some of the reducing) is done before it's reduced at a central location into it's final form.
//I haven't read all the way through Google's famous paper on the subject (http://research.google.com/archive/mapreduce.html)
//but I imagine that it revolves around the fact that Google can use this "trick" to efficiently process their data
//as many of their data sets span several machines.
//I think there's some extra on keys using keys to sort data, but there's a few ways you can handle that.
//I also imagine that there might be some interesting stuff on NoSQL databases -- I really should read that thing...
//BONUS KNOWLEDGE: If you ensure that the REDUCE FUNCTION is an ASSOCIATIVE FUNCTION, then the order in which data is combined will have no effect on the final outcome of the data.
//Associative means f(a,b) === f(b,a)
//Keep in mind that Map Reduce is an absurdly simple concept
//That's it.
//////////
//Examples -- Just a few to get you started
//////////
//Note: Please note that in many examples, I create arrays of string splitting strings rather than creating arrays explicityly.
//Example: Calculate 15% tip on a list of bills and order them
//No keys used
//Reduce simply creates a sorted array
var mapper = function (item){return Math.round(parseFloat(item)*1.15*100)/100};
var reducer = function (intermediate, current) { intermediate.push(current); return intermediate.sort();};
var initial = [];
var data = "14.22 15.28 19.42 16.80 11.96".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Zip Arrays
//No keys used
//Reduce simply creates a sorted array
var mapper = function (item){return item;}
var reducer = function (intermediate, current) {
for(var i = 0; i<current.length; i++){
intermediate[i] = intermediate[i] || [];
intermediate[i].push(current[i]);
}
return intermediate;
}
var initial = [];
var data = [[1,2,3],[4,5,6],[7,8,9]]
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Get count of each word in a sentence
//Keys are generated dynamically from data
//Inspired by http://java.dzone.com/articles/javascript-mapreduce-one-liner
var mapper = function (item){return item.toLowerCase()};
var reducer = function (intermediate, current) { if (current) intermediate[current] = (intermediate[current] || 0) + 1; return intermediate;};
var initial = {};
var data = "Did that little red hen just run over that larger, redder black hen over there?".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Count the number of characeters in each word in a sentence and group them by parity
//Possible keys know before hand and are incorporated into the initial value
//Doing this can simply the reduce function
var mapper = function (item){
return {
parity : (item.length % 2)? "odd" : "even",
len : item.length
}
};
var reducer = function (intermediate, current) {
if (current)
intermediate[current.parity].push(current);
return intermediate;
};
var initial = {"even":[],"odd":[]};
var data = "1 111 11 1111 11 111 1111 111 11 111 11 111111 11 1 11 11 1111".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));
//Example: Count the number of characeters in each word in a sentence and group them by parity
//Possible keys not know before hand
var mapper = function (item){
return {
parity : item.length % 2,
len : item.length
}
};
var reducer = function (intermediate, current) {
if (current)
intermediate[current.parity] = intermediate[current.parity]? intermediate[current.parity].concat(current.len) : [current.len];
return intermediate;
};
var initial = {};
var data = "1 111 11 1111 11 111 1111 111 11 111 11 111111 11 1 11 11 1111".split(" ");
console.log(data.map(mapper).reduce(reducer,initial));
//////////
//License -- (MIT License)
//////////
/*
Copyright (c) 2013 John Henry &lt;john@iamjohnhenry.com&gt;
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
'Software'), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment