Skip to content

Instantly share code, notes, and snippets.

@lukas-vlcek lukas-vlcek/gist:5846745
Last active Oct 10, 2018

Embed
What would you like to do?
Ukázka ICU Folding. Předpokládá Elasticsearch 0.90.0 a nainstalovaný ICU plugin 1.9.0
#!/bin/sh
curl -X DELETE 'localhost:9200/i/'
curl -X POST 'localhost:9200/i/' -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0,
"analysis" : {
"analyzer" : {
"icu_folding" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["icu_folding"]
},
"ascii_folding" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["asciifolding","lowercase"]
}
}
}
}
}'
# exit; and test manually...
# ascii folding and icu folding work the same way (except the lowercasing which has to be added into ascii_filding)
curl 'localhost:9200/i/_analyze?analyzer=icu_folding&pretty=true' -d 'Běloučký kůň úpěl ódy!'
curl 'localhost:9200/i/_analyze?analyzer=ascii_folding&pretty=true' -d 'Běloučký kůň úpěl ódy!'
# Ascii folding works in some cases...
curl 'localhost:9200/i/_analyze?analyzer=icu_folding&pretty=true' -d 'dž ¼ № ℃ ™ Æ Ȣ ffi '
curl 'localhost:9200/i/_analyze?analyzer=ascii_folding&pretty=true' -d 'dž ¼ № ℃ ™ Æ Ȣ ffi '
# Ascii folding noop here... ICU folding rocks!
curl 'localhost:9200/i/_analyze?analyzer=icu_folding&pretty=true' -d 'º o ª a ℹ i ℇ e'
curl 'localhost:9200/i/_analyze?analyzer=ascii_folding&pretty=true' -d 'º o ª a ℹ i ℇ e'
@gondo

This comment has been minimized.

Copy link

gondo commented Oct 14, 2013

for those who are lazy or simply cant try this right away, the differences are:
icu_folding: dž -> dz, ¼ -> 1/4, № -> no ...
ascii_folding: dž -> dz, ¼ -> ¼, № -> № ...
icu_folding: º -> o, o -> o, ª -> a, a -> a, ℹ -> i, i -> i, ℇ -> e, e -> e
ascii_folding: º -> º, o -> o, ª -> ª, a -> a, ℹ -> ℹ, i -> i, ℇ -> ℇ, e -> e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.