Skip to content

Instantly share code, notes, and snippets.

@goshacmd
Created January 20, 2013 20:01
Show Gist options
  • Save goshacmd/4581288 to your computer and use it in GitHub Desktop.
Save goshacmd/4581288 to your computer and use it in GitHub Desktop.
MongoDB: Map/Reduce vs Aggregation framework on 1M docs. Basically just counting how many times was made each request (only 250K of requests had search queries). Aggregation turned to be 84x faster.
1000000 documents
249819 documents with non-empty 'search_query'
user system total real
simple map/reduce 0.270000 0.010000 0.280000 (353.701946)
simple aggregation 0.180000 0.010000 0.190000 ( 8.049170)
filtering map/reduce 0.250000 0.010000 0.260000 (337.955130)
filtering aggregation 0.150000 0.010000 0.160000 ( 4.095468)
require File.expand_path('../config/environment', __FILE__)
puts "#{RequestEvent.count} documents"
puts "#{RequestEvent.ne("data.search_query" => nil).count} documents with non-empty 'search_query'"
puts
# Simple map/reducing
def simple_map_reduce
map = %Q{
function() {
emit(this.data.search_query, { count: 1 });
}
}
reduce = %Q{
function(key, values) {
var result = { count: 1 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
}
RequestEvent.all.map_reduce(map, reduce).out(inline: 1).to_a
end
# Simple aggregation
def simple_aggregation
pipeline = [{ "$group" => { "_id" => "$data.search_query", "count" => { "$sum" => 1 } } }]
RequestEvent.collection.aggregate(pipeline)
end
# Map/reduce, exclude blank fields
def filtering_map_reduce
map = %Q{
function() {
if (this.data.search_query)
emit(this.data.search_query, { count: 1 });
}
}
reduce = %Q{
function(key, values) {
var result = { count: 1 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
}
}
RequestEvent.all.map_reduce(map, reduce).out(inline: 1).to_a
end
# Aggregate, exclude blank fields
def filtering_aggregation
pipeline = [
{ "$match" => { "data.search_query" => { "$ne" => nil } } },
{ "$group" => { "_id" => "$data.search_query", "count" => { "$sum" => 1 } } }
]
RequestEvent.collection.aggregate(pipeline)
end
Benchmark.bm do |x|
x.report("simple map/reduce") { simple_map_reduce }
x.report("simple aggregation") { simple_aggregation }
x.report("filtering map/reduce") { filtering_map_reduce }
x.report("filtering aggregation") { filtering_aggregation }
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment