Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Better handling of English contractions in lunr.
lunr.contractionTrimmer = function (token) {
return token.replace(/('ve|n't|'d|'ll|'ve|'s|'re)$/, "")
}
lunr.Pipeline.registerFunction(lunr.stopWordFilter, 'contractionTrimmer')
var englishContractions = function (idx) {
idx.pipeline.after(lunr.trimmer, lunr.contractionTrimmer)
}
@j1m1lo

This comment has been minimized.

Copy link

j1m1lo commented May 18, 2017

I'm considering using this in our production environment.

Questions:

  • Is there a specific reason why your trimmer replaces n't, and not just 't?
  • My trimmer return token.replace(/('m|'ve|'t|'d|'ll|'ve|'s|'re)$/, "") also replaces "I'm" - seems to work alright. Is there a downside? Did you leave it out on purpose?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.