Skip to content

Instantly share code, notes, and snippets.

@pslusarz pslusarz/Solr for Polish
Last active Mar 2, 2016

Embed
What would you like to do?
Works in Solr 5.3.0, Lucene 5.3.0
Refer to Solr 101 gist to get solr working https://gist.github.com/pslusarz/7de913ac63e36e8983b8#file-gistfile1-txt
1. download lucene and copy the following jars into SOLR_ROOT/server/solr-webapp/WEB-INF/lib
- lucene-analyzers-morfologik-X.X.jar,
- apache-solr-analysis-extras-X.X.jar (not in lucene, but in solr/dist)
- morfologik-fsa-X.X.jar,
- morfologik-polish-X.X.jar
- morfologik-stemming-X.X.jar
2. modify SOLR_ROOT/server/solr/<core>/conf/managed-schema.xml
a) field type definition
<fieldType name="text_pl" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.MorfologikFilterFactory"/> <!-- case insensitive by default -->
<filter class="solr.ASCIIFoldingFilterFactory"/> <!-- convert Polish characters to latin -->
</analyzer>
</fieldType>
b) <field name="_text_" type="text_pl" multiValued="true" indexed="true" stored="true"/> - this is a hack, _text_ is a catch all field
Reload core using admin console (or restart Solr). Important, don't Unload core - you'll have a hard time getting it back.
3. These parameters seem to work for a good query.
In particular, limit returned fields to NOT come back with _text_ field - it's huge. Add "score" since it's not included by default.
If you want highlighting feature to be useful (samples of found text): hl=true, hl.snippets=100
Ex:
http://localhost:8983/solr/radom/select?q=wiecej&rows=100&fl=id%2Cscore&wt=json&indent=true&hl=true&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E&hl.requireFieldMatch=true&hl.usePhraseHighlighter=true&hl.highlightMultiTerm=true&hl.snippets=100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.