Skip to content

Instantly share code, notes, and snippets.

@peaeater
peaeater / gist:5810540
Created June 18, 2013 23:46
solrconfig.xml /suggest request handler
<!-- request handler to return typeahead suggestions -->
<requestHandler name="/suggest" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="defType">edismax</str>
<str name="rows">10</str>
<str name="fl">universe,collection,title,score</str>
<str name="qf">title_suggest^30 title_suggest_ngram^50.0 collection_suggest^15 collection_suggest_ngram^25.0</str>
<str name="pf">title_suggest_edge^50.0 collection_suggest_edge^25.0</str>
<str name="group">true</str>
@peaeater
peaeater / gist:5810550
Created June 18, 2013 23:47
schema copy fields for suggest
<!-- suggest fields -->
<copyField source="title" dest="title_suggest" />
<copyField source="title" dest="title_suggest_edge" />
<copyField source="title" dest="title_suggest_ngram" />
<copyField source="title" dest="title_s" />
<copyField source="collection" dest="collection_suggest" />
<copyField source="collection" dest="collection_suggest_edge" />
<copyField source="collection" dest="collection_suggest_ngram" />
<copyField source="collection" dest="collection_s" />
<copyField source="universe" dest="universe_suggest" />
@peaeater
peaeater / gist:5810559
Created June 18, 2013 23:48
text_suggest field type
<!-- text_suggest : Matches whole terms in the suggest text -->
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
@peaeater
peaeater / powershell filename replace
Created August 22, 2013 17:00
Replace a portion of a filename in a list of filenames with regex.
ls * -name | ren -newname {$_ -replace '^(.*)-replaceme-(.*)$', '$1-newvalue-$2'}
@peaeater
peaeater / text_suggest_ngram.xml
Last active August 22, 2016 19:27
Text suggest ngram Solr field type
<fieldType name="text_suggest_ngram" class="solr.TextField">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="20" minGramSize="1"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
</analyzer>
<analyzer type="query">
@peaeater
peaeater / text_suggest_edge.xml
Last active January 4, 2024 14:18
Text suggest edge Solr field type
<fieldType name="text_suggest_edge" class="solr.TextField">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:-_])" replacement=" " replace="all"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="30" minGramSize="1"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
</analyzer>
<analyzer type="query">
@peaeater
peaeater / pdf2djvu.ps1
Created November 29, 2013 17:48
Converts PDF to 300 dpi DJVU. Requires pdf2djvu => https://code.google.com/p/pdf2djvu/
# convert pdf to djvu
# accepts a .pdf input, outputs a 300dpi .djvu, returns djvu full name
# requires pdf2djvu
param(
[Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]
[ValidateScript({[System.IO.Path]::GetExtension($_) -eq ".pdf"})]
[string]$in,
[Parameter(Mandatory=$false,ValueFromPipeline=$true,Position=1)]
[ValidateScript({[System.IO.Path]::GetExtension($_) -eq ".djvu"})]
@peaeater
peaeater / djvu2txt.ps1
Created November 29, 2013 17:55
Produce a plain text file per page from DJVU file. Output name includes total page count of input file, and page number of current page. Requires djvutxt => http://djvu.sourceforge.net/doc/man/djvutxt.html
# extract plain text per page from djvu
# requires djvulibre
param(
[Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]
[string]$in
)
process
{
@peaeater
peaeater / djvu2xml.ps1
Created November 29, 2013 17:59
Produce a canvas structure XML (which djvulibre calls 'hidden text') file per page from a DJVU file. Output name includes total page count of input file, and page number of current page. Requires djvutoxml => http://djvu.sourceforge.net/doc/man/djvuxml.html
# extract hidden text xml per page from djvu
# requires djvulibre
param(
[Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]
[string]$in
)
process
{
@peaeater
peaeater / djvu2tif.ps1
Created November 29, 2013 18:06
Produce a TIF per page from the input DJVU file. The output name is simply the page number of the current page. Requires ddjvu => http://djvu.sourceforge.net/doc/man/ddjvu.html
# extract tif per page from djvu
# requires djvulibre
param(
[Parameter(Mandatory=$true,ValueFromPipeline=$true,Position=0)]
[string]$in
)
process
{