Skip to content

Instantly share code, notes, and snippets.

View polyfractal's full-sized avatar

Zachary Tong polyfractal

View GitHub Profile
{
"test":{
"state":"open",
"settings":{
"index.analysis.filter.filter_shingle.type":"shingle",
"index.number_of_replicas":"0",
"index.analysis.filter.filter_shingle.output_unigrams":"true",
"index.analysis.analyzer.analyzer_shingle.tokenizer":"standard",
"index.analysis.filter.filter_shingle.min_shingle_size":"2",
"index.analysis.analyzer.analyzer_shingle.filter.0":"standard",
<?php
require 'vendor/autoload.php';
use Sherlock\Sherlock;
function pprint($value) {
print_r($value);
echo "\r\n";
@polyfractal
polyfractal / gist:4997040
Created February 20, 2013 16:54
Reroute API breaks Cluster/Node/Stats API. More details
##Listed in order of operations and the resulting output:
## reroute the shards - executed on C1 client node
$curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands":[
{
"move":{
"index":"test",
"shard":2,
"from_node":"J47gdOIwQMq2GTmzzmzJBA",
@polyfractal
polyfractal / gist:4968387
Created February 16, 2013 19:37
Logging.yml to enable a SocketAppender, which will be used to talk to Logstash
rootLogger: INFO, console, file, socketappender
logger:
# log action execution errors for easier debugging
action: DEBUG
# reduce the logging for aws, too much is logged under the default INFO
com.amazonaws: WARN
# gateway
#gateway: DEBUG
#index.gateway: DEBUG
@polyfractal
polyfractal / gist:4959909
Last active December 13, 2015 19:08
Mapping for indexing throughput benchmark
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 0,
"index": {
"analysis": {
"analyzer": {
"analyzer_shingle": {
"tokenizer": "standard",
"filter": [
###Changes:
-Added "include_in_root" for each nested object.
-Removed the "nested" params from the last facet
This basically copies the nested doc into the root doc. You then reference the root "inner object" rather than the "nested object" to get the data. Be careful though, this breaks down if you have multiple nested docs that share the same name (e.g. array of nested), since the facet will operate on the entire array instead of individual ones.
See this thread for more info: https://groups.google.com/d/topic/elasticsearch/pjoNmosdCPs/discussion
@polyfractal
polyfractal / gist:4063964
Created November 13, 2012 04:46
ES Mapping
{
"mappings":{
"post":{
"properties":{
"body":{
"fields":{
"body":{
"type":"string",
"analyzer":"analyzer_term"
},

Ran on my macbook air, half a million docs. Single node, 5 primary 0 replica. Node restarted between runs to make sure all caches cleared, etc.

Existing benchmark

$ python loadtester.py --es "http://localhost:9200/speedtest/_search" -i ../data/stoicism.txt -o test1.txt --ns 10000 --nt 3 --nf 10
0 26004 1.36110687256
1000 5561 0.0182199478149
2000 10516 0.0134048461914
3000 42137 0.0833399295807
4000 34922 0.0168430805206
termInfo = _index[field].get(term,_PAYLOADS);
score = 0;
for (pos : termInfo) {
score = score + pos.payloadAsFloat(0);
}
return score;