Skip to content

Instantly share code, notes, and snippets.

@duydo
Created October 17, 2013 09:52
Show Gist options
  • Save duydo/7022171 to your computer and use it in GitHub Desktop.
Save duydo/7022171 to your computer and use it in GitHub Desktop.
Preserving Special Characters During Tokenization twitter message with elasticsearch
curl -XPUT 'http://localhost:9200/twitter' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
},
"analysis" : {
"filter" : {
"tweet_filter" : {
"type" : "word_delimiter",
"type_table": ["# => ALPHA", "@ => ALPHA"]
}
},
"analyzer" : {
"tweet_analyzer" : {
"type" : "custom",
"tokenizer" : "whitespace",
"filter" : ["lowercase", "tweet_filter"]
}
}
}
},
"mappings" : {
"tweet" : {
"properties" : {
"msg" : {
"type" : "string",
"analyzer" : "tweet_analyzer"
}
}
}
}
}'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment