Created
August 7, 2011 17:03
-
-
Save rogerbraun/1130546 to your computer and use it in GitHub Desktop.
Partially working search with picky and japanese
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# encoding: utf-8 | |
# | |
# TODO Adapt the generated example | |
# (a library books finder) to what you need. | |
# | |
# Questions? Mail me, IRC #picky, the google group, http://github.com/floere/picky/wiki. | |
# | |
class PickySearch < Picky::Application | |
# So we don't have to write Picky:: | |
# in front of everything. | |
# | |
include Picky | |
# How text is indexed. Move to Index block to make it index specific. | |
# | |
indexing removes_characters: /[^a-zA-Z0-9\s\/\-\_\:\"\&\.]/i, | |
stopwords: /\b(and|the|of|it|in|for)\b/i, | |
splits_text_on: /[\s\/\-\_\:\"\&\/]/ | |
# How query text is preprocessed. Move to Search block to make it search specific. | |
# | |
searching removes_characters: /[^\p{Han}\p{Katakana}\p{Hiragana}a-zA-Z0-9\s\/\-\_\&\.\"\~\*\:\,]/i, # Picky needs control chars *"~:, to pass through. | |
stopwords: /\b(and|the|of|it|in|for)\b/i, | |
splits_text_on: /[\s\/\-\&]+/, | |
maximum_tokens: 5, # Amount of tokens used in a search (5 = default). | |
substitutes_characters_with: CharacterSubstituters::WestEuropean.new # Normalizes special user input, Ä -> Ae, ñ -> n etc. | |
japanese_index = Indexes::Memory.new :japanese do | |
source Sources::CSV.new(:japanese, :german, :file => "data/development/japanese.tab", :col_sep => "\t") | |
indexing :removes_characters => /[^\p{Han}\p{Katakana}\p{Hiragana}\s;]/, | |
:stopwords => /\b(and|the|of|it|in|for)\b/i, | |
:splits_text_on => /[\s;]/ | |
category :japanese, :partial => Partial::None.new | |
end | |
route %r{\A/japanese\Z} => Search.new(japanese_index) do | |
end | |
end |
Cool. I hope to release 3.0.0 soon. But first, a lot of documentation needs to be updated, and possible API inconsistencies found.
No worries, I just wanted to point out some improvements you could make. But it seems you already know all this :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Florian,
you are right, this is 3.0.0.pre1 with my subtoken generation fix. This is not a minimal example, I just modified the standard config until it worked. Sorry for any confusion caused.