Skip to content

Instantly share code, notes, and snippets.

@olivernn
Created January 19, 2016 16:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save olivernn/7cd496f8654a0246c53c to your computer and use it in GitHub Desktop.
Save olivernn/7cd496f8654a0246c53c to your computer and use it in GitHub Desktop.
Better handling of English contractions in lunr.
lunr.contractionTrimmer = function (token) {
return token.replace(/('ve|n't|'d|'ll|'ve|'s|'re)$/, "")
}
lunr.Pipeline.registerFunction(lunr.stopWordFilter, 'contractionTrimmer')
var englishContractions = function (idx) {
idx.pipeline.after(lunr.trimmer, lunr.contractionTrimmer)
}
@albertsemple
Copy link

I took a bit of a blunderbust approach to this:

token.replace(/[^A-Za-z é]/g, "");

I had an issue that the possessive for of the surname "Burns" had been misspelt as "Burn's" in the corpus, and wanted to add tolerance for those kind of misspellings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment