Skip to content

Instantly share code, notes, and snippets.

@joshdcollins
Last active October 21, 2015 18:14
Show Gist options
  • Save joshdcollins/df2f3e1597fd08de360d to your computer and use it in GitHub Desktop.
Save joshdcollins/df2f3e1597fd08de360d to your computer and use it in GitHub Desktop.
SOLR Config - AutoPhrasing
cel-2000
CEL-2000
CEL 2000
CEL2000
document with an entity_name of 'CEL-2000'
<fieldType name="text_autophrase" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="com.lucidworks.analysis.AutoPhrasingTokenFilterFactory" phrases="autophrases.txt" includeTokens="true" replaceWhitespaceWith="_" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
</analyzer>
</fieldType>
webapp=/solr path=/autophrase params={q="cel-2000"&defType=dismax&qf=entity_name^100.0+content+entity_author&pf=entity_name+content&rows=100&wt=json&debugQuery=true}
<requestHandler name="/autophrase" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">_text_</str>
</lst>
<lst name="invariants">
<str name="defType">autophrasingParser</str>
</lst>
</requestHandler>
CEL-2000,CEL-SCI,CEL_2000,CEL2000
- CEL-2000 - pass, but also returns a lot of 'noise' based on '2000' and 'CEL'
- "CEL-2000" - pass, only matching record found
- CEL 2000 - fail (no results)
- "CEL 2000" - pass, only matching record found
- CEL2000 - fail (no results)
- "CEL2000" - fail (no results)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment