Skip to content

Instantly share code, notes, and snippets.

@gregplaysguitar
Last active May 14, 2022 15:09
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save gregplaysguitar/1727204 to your computer and use it in GitHub Desktop.
Save gregplaysguitar/1727204 to your computer and use it in GitHub Desktop.
Django-haystack Whoosh backend with character folding
# -*- coding: utf-8 -*-
"""
Whoosh backend for haystack that implements character folding, as per
http://packages.python.org/Whoosh/stemming.html#character-folding .
Tested with Haystack 2.4.0 and Whooch 2.7.0
To use, put this file on your path and add it to your haystack settings, eg.
HAYSTACK_CONNECTIONS = {
'default': {
'ENGINE': 'folding_whoosh_backend.FoldingWhooshEngine',
'PATH': 'path-to-whoosh-index',
},
}
"""
from haystack.backends.whoosh_backend import WhooshEngine, WhooshSearchBackend
from whoosh.analysis import CharsetFilter, StemmingAnalyzer
from whoosh.support.charset import accent_map
from whoosh.fields import TEXT
class FoldingWhooshSearchBackend(WhooshSearchBackend):
def build_schema(self, fields):
schema = super(FoldingWhooshSearchBackend, self).build_schema(fields)
for name, field in schema[1].items():
if isinstance(field, TEXT):
field.analyzer = StemmingAnalyzer() | CharsetFilter(accent_map)
return schema
class FoldingWhooshEngine(WhooshEngine):
backend = FoldingWhooshSearchBackend
@paweloque
Copy link

I still cannot search using the words without accents like:
search with 'cafe' and get back results like: 'café', 'cafe'.
Do I have to do something additional like changing the index template?

@gregplaysguitar
Copy link
Author

@paweloque, no, you should just be able to change the backend. Make sure you reindex the content after doing this.

@dmarcelino
Copy link

Great stuff, thanks @gregplaysguitar

@benzkji
Copy link

benzkji commented Jul 31, 2020

if you use a EdgeNgramField, you can use this:

if isinstance(field, NGRAMWORDS):
    field.analyzer = StemmingAnalyzer() | NgramFilter(minsize=X) | CharsetFilter(accent_map)

or, event better maybe, keep the original analyzer, and add the filter like this;

field.analyzer = field.analyzer | CharsetFilter(accent_map)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment