Skip to content

Instantly share code, notes, and snippets.

@rueckstiess
Created March 27, 2015 19:18
Show Gist options
  • Save rueckstiess/a53d9d7b29b69a7455e6 to your computer and use it in GitHub Desktop.
Save rueckstiess/a53d9d7b29b69a7455e6 to your computer and use it in GitHub Desktop.
MongoDB shell helper for reservoir sampling via map/reduce
if (typeof DBCollection !== 'undefined') {
DBCollection.prototype.sample = function(num) {
function mapFn() {
if (i < k) {
emit(i, this);
} else {
var j = Math.floor(Math.random() * i);
if (j < k) {
emit(j, this);
}
}
i++;
}
function reduceFn(key, value) {
return value[value.length - 1];
}
// global scope
var scope = {
k: num || 100,
i: 0
};
var res = this._db.runCommand({
mapReduce: this._shortName,
map: mapFn,
reduce: reduceFn,
out: {
inline: 1
},
scope: scope,
});
return res.results.map(function(d) {
return d.value;
});
};
}
@rueckstiess
Copy link
Author

Usage:

Start the mongo shell like this:

mongo reservoir_mr.js --shell

You can also add the following line to your ~/.mongorc.js file to always load the file on shell startup (unless started with --norc):

load('/path/to/reservoir_mr.js')

In the mongo shell, you can now do:

db.users.sample(50)

This gives you 50 random documents from the users collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment