Skip to content

Instantly share code, notes, and snippets.

@xcaspar
Created July 21, 2015 08:24
Show Gist options
  • Save xcaspar/126dc3200b506287967b to your computer and use it in GitHub Desktop.
Save xcaspar/126dc3200b506287967b to your computer and use it in GitHub Desktop.

MongoDB中MapReduce的使用

@(mongodb)[mapreduce]

介绍

MapReduce是一个编程模型,封装了并行计算、容错、数据分布、负载均衡等细节问题。MapReduce实现最开始是映射map,将操作映射到集合中的每个文档,然后按照产生的键进行分组,并将产生的键值组成列表放到对应的键中。化简(reduce)则是把列表中的值化简成一个单值,这个值被返回,然后再次进行键分组,直到每个键的列表只有一个值为止。

代码实现(部分重要代码)

例如有一文档user,结构如下:

{
	“_id” : “00000000001”,
	“name” : “zhangsan”,
	“login_time” : “2013-07-20”
}

要按照名字name统计记录个数,mongo代码如下:

map = function() {
... emit(this.name, {count:1});
... };
reduce = function(key, values) {
... var total = 0;
... var index =0;
... for(var i=0;i<values.length;i++){
... total += values[i].count;
... index = i;
... }
... return {count:total};
... };
emit就是指定key和value的,也是结果的数据结构,计算结果如下: db.person.mapReduce(map, reduce, {out : "resultCollection"});
{
"result" : "resultCollection",
"timeMillis" : 112,
"counts" : {
"input" : 10,
"emit" : 10,
"reduce" : 2,
"output" : 2
},
"ok" : 1,
}
db.resultCollection.find();
{ "_id" : "xiaofancn", "value" : { "count" : 3 } }
{ "_id" : "zhangsan", "value" : { "count" : 7 } }

MapReduce最重要的操作就是Javascript的编写,根据要统计的键值进行对文档的遍历,然后并行计算,得出结果。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment