Skip to content

Instantly share code, notes, and snippets.

@lukas-vlcek
Created July 19, 2013 13:31
Show Gist options
  • Save lukas-vlcek/6039115 to your computer and use it in GitHub Desktop.
Save lukas-vlcek/6039115 to your computer and use it in GitHub Desktop.
Torturing Word Delimiter TokenFilter in ElasticSearch
#!/bin/sh
curl -X DELETE 'localhost:9200/i/'
curl -X POST 'localhost:9200/i/' -d '{
"settings" : {
"analysis" : {
"analyzer" : {
"crazy" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : [ "replace-whitespaces", "strip_underscore" ]
}
},
"filter" : {
"replace-whitespaces" : {
"type" : "pattern_replace",
"pattern" : "\\s+",
"replacement" : "_"
},
"strip_underscore" : {
"type" : "word_delimiter",
"split_on_numerics" : false,
"split_on_case_change" : false,
"generate_word_parts" : false,
"generate_number_parts" : false,
"catenate_all" : true
}
}
}
}
}'
curl 'localhost:9200/i/_analyze?analyzer=crazy&pretty=true' -d 'Hello-Hello 101 World'
# {"ok":true,"acknowledged":true}{"ok":true,"acknowledged":true}{
# "tokens" : [ {
# "token" : "HelloHello101World",
# "start_offset" : 0,
# "end_offset" : 21,
# "type" : "word",
# "position" : 1
# } ]
# }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment