Skip to content

Instantly share code, notes, and snippets.

@krzynio
Forked from pslusarz/Solr for Polish
Created March 14, 2024 09:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save krzynio/3350c6d08f7413a088b9524039f638ad to your computer and use it in GitHub Desktop.
Save krzynio/3350c6d08f7413a088b9524039f638ad to your computer and use it in GitHub Desktop.
Works in Solr 5.3.0, Lucene 5.3.0
Refer to Solr 101 gist to get solr working https://gist.github.com/pslusarz/7de913ac63e36e8983b8#file-gistfile1-txt
1. download lucene and copy the following jars into SOLR_ROOT/server/solr-webapp/WEB-INF/lib
- lucene-analyzers-morfologik-X.X.jar,
- apache-solr-analysis-extras-X.X.jar (not in lucene, but in solr/dist)
- morfologik-fsa-X.X.jar,
- morfologik-polish-X.X.jar
- morfologik-stemming-X.X.jar
2. modify SOLR_ROOT/server/solr/<core>/conf/managed-schema.xml
a) field type definition
<fieldType name="text_pl" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.MorfologikFilterFactory"/> <!-- case insensitive by default -->
<filter class="solr.ASCIIFoldingFilterFactory"/> <!-- convert Polish characters to latin -->
</analyzer>
</fieldType>
b) <field name="_text_" type="text_pl" multiValued="true" indexed="true" stored="true"/> - this is a hack, _text_ is a catch all field
Reload core using admin console (or restart Solr). Important, don't Unload core - you'll have a hard time getting it back.
3. These parameters seem to work for a good query.
In particular, limit returned fields to NOT come back with _text_ field - it's huge. Add "score" since it's not included by default.
If you want highlighting feature to be useful (samples of found text): hl=true, hl.snippets=100
Ex:
http://localhost:8983/solr/radom/select?q=wiecej&rows=100&fl=id%2Cscore&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E&hl.requireFieldMatch=true&hl.usePhraseHighlighter=true&hl.highlightMultiTerm=true&hl.snippets=100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment