Skip to content

Instantly share code, notes, and snippets.

@rogerbraun
Created August 6, 2011 12:16
Show Gist options
  • Save rogerbraun/1129296 to your computer and use it in GitHub Desktop.
Save rogerbraun/1129296 to your computer and use it in GitHub Desktop.
Picky gives an error when trying to index japanese
# encoding: utf-8
#
# TODO Adapt the generated example
# (a library books finder) to what you need.
#
# Questions? Mail me, IRC #picky, the google group, http://github.com/floere/picky/wiki.
#
class PickySearch < Application
# How text is indexed. Move to Index block to make it index specific.
#
indexing removes_characters: /[^a-zA-Z0-9\s\/\-\_\:\"\&\.]/i,
stopwords: /\b(and|the|of|it|in|for)\b/i,
splits_text_on: /[\s\/\-\_\:\"\&\/]/
# How query text is preprocessed. Move to Search block to make it search specific.
#
searching removes_characters: /[^a-zA-Z0-9\s\/\-\_\&\.\"\~\*\:\,]/i, # Picky needs control chars *"~:, to pass through.
stopwords: /\b(and|the|of|it|in|for)\b/i,
splits_text_on: /[\s\/\-\&]+/,
maximum_tokens: 5, # Amount of tokens used in a search (5 = default).
substitutes_characters_with: CharacterSubstituters::WestEuropean.new # Normalizes special user input, Ä -> Ae, ñ -> n etc.
japanese_index = Index::Memory.new :japanese do
source Sources::CSV.new(:japanese, :german, :file => "data/development/japanese.tab", :col_sep => "\t")
indexing :removes_characters => /[^\p{Han}\p{Katakana}\p{Hiragana}\s;]/,
:stopwords => /\b(and|the|of|it|in|for)\b/i,
:splits_text_on => /[\s;]/
category :japanese,
end
end
roger@roger-MS-7621:~/Dropbox/picky_experiments/japanese_test$ rackup
Loaded picky with environment 'development' in /home/roger/Dropbox/pick
y_experiments/japanese_test on Ruby 1.9.2.
WARNING: No routes defined for application configuration in Class.
Application PickySearch loaded.
14:13:05: "development:japanese:japanese": Loading index from cache.
/home/roger/.rvm/gems/ruby-1.9.2-p180/gems/yajl-ruby-0.8.2/lib/yajl.rb:
37:in `parse': invalid encoding symbol (EncodingError)
from /home/roger/.rvm/gems/ruby-1.9.2-p180/gems/yajl-ruby-0.8.2
/lib/yajl.rb:37:in `parse'
1 日本語; にほんご Japanisch; Die japanische Sprache
2 食べる; たべる Essen.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment