Skip to content

Instantly share code, notes, and snippets.

@babadofar
Last active August 29, 2015 14:02
Show Gist options
  • Save babadofar/b8058dde69f69950550a to your computer and use it in GitHub Desktop.
Save babadofar/b8058dde69f69950550a to your computer and use it in GitHub Desktop.
Compound words demo with ngram tokenizer elasticsearch
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"settings": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"type": "custom",
"tokenizer": "my_toknizer"
}
},
"tokenizer": {
"my_toknizer": {
"type": "ngram",
"min_gram": 2,
"max_gram": 3
}
}
}
},
"mappings": {
"type": {
"properties": {
"foo": {
"type": "string",
"analyzer": "myAnalyzer"
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"foo":"arbeid"}
{"index":{"_index":"play","_type":"type"}}
{"foo":"arbeidsmiljø"}
{"index":{"_index":"play","_type":"type"}}
{"foo":"arbeidsmiljølovgivning"}
'
# Do searches
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"foo": "miljø"
}
}
}
'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment