Skip to content

Instantly share code, notes, and snippets.

@nevans
Last active August 28, 2020 12:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nevans/5512593 to your computer and use it in GitHub Desktop.
Save nevans/5512593 to your computer and use it in GitHub Desktop.
Improving speed on slow CouchDB reduce functions

A common pattern in my CouchDB view reductions is to "merge" together the objects generated by the map function while preserving the original form, and only query that view with group=true. It's easiest to write the view reductions naively, so they continue merging the data all the way up to the top-level reduction (group=false).

But because CouchDB stores its b+trees with the reduction for each b+tree node stored in that node, moderately sized objects can result in a lot of extra writes to disk, and moderately complicated functions can cause a lot of wasted CPU time in the indexer running merge javascript that is never queried. Re-balancing the b+tree compounds this problem. This can cause the initial creation of large indexes to slow down tremendously. If the index becomes terribly fragmented, this will also affect query speed.

One solution: once the reduction is beyond the keys at the group level I care about, stop running the merge code and return the simplest data that works (e.g. null or an integer sum). The following example code does that, with very minimal constraints on the original reduce function.

I haven't run this exact code (it hasn't been tested yet and is probably buggy), but I do have code following this pattern running in production. For small indexes, this doesn't speed anything up and might even make it a little slower. But for large indexes with complicated merges, this code can result in a 2-5x speedup.

My question: is my there a better approach or better javascript to accomplish this?

function(keys, values, rereduce) {
// If the reduction values are for a single key, return that key.
// Otherwise, return null.
var getSingleKey = function() {
if (!rereduce) {
// initial reduction pass is simple; just check the keys array.
var key = keys[0][0];
// do any keys differ from the first key?
for (var i=1; i < keys.length; i++) {
if (key !== keys[i][0]) { return null; }
}
return key;
} else {
// first intermediate value may have already seen multiple keys
if (typeof(values[0]) !== "object" || !("key" in values[0])) { return null; }
var key = values[0].key;
// have any other intermediate values keys already seen multiple keys
// or do they differ from the first key?
for (var i=1; i < keys.length; i++) {
if (typeof(values[i]) !== "object" || !("key" in values[0]) || key !== values[i].key) {
return null;
}
}
return key;
}
};
// The original reduce function.
// Rules: this *must* return an object, and result.key will be overridden.
var originalReduce = function(keys, values, rereduce) {
var result = {};
// YOUR FANCY CODE HERE...
return result;
}
var key = getSingleKey();
if (key) {
var result = originalReduce(keys, values, rereduce);
result.key = key; // or the portion of the key that you'll use with group_level
return result;
} else {
return null; // or the count or stats or some other simplified view of the world
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment