Skip to content

Instantly share code, notes, and snippets.

@dmitryame
Created October 8, 2010 14:21
Show Gist options
  • Save dmitryame/616874 to your computer and use it in GitHub Desktop.
Save dmitryame/616874 to your computer and use it in GitHub Desktop.
mongo db Standard deviation calculation with map reduce
// sample data
{ "_id" : ObjectId("4caf19200d282159bf000001"), "date" : "2010-10-06", "seq" : "00:00:00,000", "method" : "getUserByScbeId", "duration" : 3 }
{ "_id" : ObjectId("4caf19200d282159bf000002"), "date" : "2010-10-06", "seq" : "00:00:00,116", "method" : "createTicket", "duration" : 116 }
{ "_id" : ObjectId("4caf19200d282159bf000003"), "date" : "2010-10-06", "seq" : "00:00:00,131", "method" : "getCollectionMetadata", "duration" : 11 }
{ "_id" : ObjectId("4caf19200d282159bf000004"), "date" : "2010-10-06", "seq" : "00:00:00,137", "method" : "getParticipation", "duration" : 6 }
{ "_id" : ObjectId("4caf19200d282159bf000005"), "date" : "2010-10-06", "seq" : "00:00:00,139", "method" : "updateSocialObjectModified", "duration" : 371 }
{ "_id" : ObjectId("4caf19200d282159bf000006"), "date" : "2010-10-06", "seq" : "00:00:00,143", "method" : "getUserByScbeId", "duration" : 4 }
// map reduce implementation
map = function() {
emit(this.method, { duration : this.duration, count : 1});
}
reduce = function(key,emits) {
var n = { count : 0, duration : 0, min : Number.MAX_VALUE, max : Number.MIN_VALUE };
for(var i in emits) {
n.count += emits[i].count;
n.duration += emits[i].duration;
}
return n;
}
fin = function(key, value) {
value.avg = value.duration / value.count;
res2 = db.logs.find({method:key});
sum2 = 0;
res2.forEach( function(v) {
if(v.duration > value.max ) {
value.max = v.duration;
}
if(v.duration < value.min ) {
value.min = v.duration;
}
tmp1 = (v.duration - value.avg);
sum2 += tmp1*tmp1;
} );
value.stdiv = Math.sqrt(sum2/value.count);
return value;
}
res = db.logs.mapReduce(map, reduce, {finalize : fin});
db[res.result].find();
@roeioved
Copy link

you can no longer access db in the finalize function...

any other suggestions of doing it without the db?

@kmpm
Copy link

kmpm commented Sep 12, 2011

working on it...

@roeioved
Copy link

actually, what i did is to calculate the deviation in the reduce function:

function Reduce(key, values) {
var ret = { count: 0, sum: 0 };

values.forEach(function(v){
ret.count += v.count;
ret.sum += v.duration;
});

ret.avg = ret.count == 0 ? 0 : ret.sum / ret.count;

sum2 = 0;
values.forEach(function(v){
tmp1 = (v.duration - ret.avg);
sum2 += tmp1 * tmp1;
});

ret.stdiv = ret.count == 0 ? 0 : Math.sqrt(sum2 / ret.count);

return ret;
}

@kmpm
Copy link

kmpm commented Sep 12, 2011

I made another, similar solution also using the reduce function. It does the calculation in a single pass.
Look in my fork.

@roeioved
Copy link

like :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment