Skip to content

Instantly share code, notes, and snippets.

@rogerbraun
Created August 7, 2011 17:03
Show Gist options
  • Save rogerbraun/1130546 to your computer and use it in GitHub Desktop.
Save rogerbraun/1130546 to your computer and use it in GitHub Desktop.
Partially working search with picky and japanese
# encoding: utf-8
#
# TODO Adapt the generated example
# (a library books finder) to what you need.
#
# Questions? Mail me, IRC #picky, the google group, http://github.com/floere/picky/wiki.
#
class PickySearch < Picky::Application
# So we don't have to write Picky::
# in front of everything.
#
include Picky
# How text is indexed. Move to Index block to make it index specific.
#
indexing removes_characters: /[^a-zA-Z0-9\s\/\-\_\:\"\&\.]/i,
stopwords: /\b(and|the|of|it|in|for)\b/i,
splits_text_on: /[\s\/\-\_\:\"\&\/]/
# How query text is preprocessed. Move to Search block to make it search specific.
#
searching removes_characters: /[^\p{Han}\p{Katakana}\p{Hiragana}a-zA-Z0-9\s\/\-\_\&\.\"\~\*\:\,]/i, # Picky needs control chars *"~:, to pass through.
stopwords: /\b(and|the|of|it|in|for)\b/i,
splits_text_on: /[\s\/\-\&]+/,
maximum_tokens: 5, # Amount of tokens used in a search (5 = default).
substitutes_characters_with: CharacterSubstituters::WestEuropean.new # Normalizes special user input, Ä -> Ae, ñ -> n etc.
japanese_index = Indexes::Memory.new :japanese do
source Sources::CSV.new(:japanese, :german, :file => "data/development/japanese.tab", :col_sep => "\t")
indexing :removes_characters => /[^\p{Han}\p{Katakana}\p{Hiragana}\s;]/,
:stopwords => /\b(and|the|of|it|in|for)\b/i,
:splits_text_on => /[\s;]/
category :japanese, :partial => Partial::None.new
end
route %r{\A/japanese\Z} => Search.new(japanese_index) do
end
end
@floere
Copy link

floere commented Aug 8, 2011

Cool. I hope to release 3.0.0 soon. But first, a lot of documentation needs to be updated, and possible API inconsistencies found.

No worries, I just wanted to point out some improvements you could make. But it seems you already know all this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment