Skip to content

Instantly share code, notes, and snippets.

@mcroydon
Created February 28, 2009 21:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcroydon/72107 to your computer and use it in GitHub Desktop.
Save mcroydon/72107 to your computer and use it in GitHub Desktop.
// Map function
function(doc) {
// Split on unicode whitespace or punctuation.
// TODO: Is there a unicode-aware \W that I could use?
// log('mapping ' + doc);
var tokens = doc.text.split(/[\s\,\.\;\:\'\"\(\)\!\?]+/);
var words = {};
for each (var token in tokens) {
token = token.toLowerCase();
if (token != "") {
if (token in words) {
words[token] += 1
}
else {
words[token] = 1
}
}
}
emit(doc._id, words);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment