Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Better handling of English contractions in lunr.
lunr.contractionTrimmer = function (token) {
return token.replace(/('ve|n't|'d|'ll|'ve|'s|'re)$/, "")
}
lunr.Pipeline.registerFunction(lunr.stopWordFilter, 'contractionTrimmer')
var englishContractions = function (idx) {
idx.pipeline.after(lunr.trimmer, lunr.contractionTrimmer)
}
@j1m1lo

This comment has been minimized.

Show comment
Hide comment
@j1m1lo

j1m1lo May 18, 2017

I'm considering using this in our production environment.

Questions:

  • Is there a specific reason why your trimmer replaces n't, and not just 't?
  • My trimmer return token.replace(/('m|'ve|'t|'d|'ll|'ve|'s|'re)$/, "") also replaces "I'm" - seems to work alright. Is there a downside? Did you leave it out on purpose?

j1m1lo commented May 18, 2017

I'm considering using this in our production environment.

Questions:

  • Is there a specific reason why your trimmer replaces n't, and not just 't?
  • My trimmer return token.replace(/('m|'ve|'t|'d|'ll|'ve|'s|'re)$/, "") also replaces "I'm" - seems to work alright. Is there a downside? Did you leave it out on purpose?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment