Skip to content

Instantly share code, notes, and snippets.

View duydo's full-sized avatar

Duy Do duydo

View GitHub Profile
@duydo
duydo / gist:2402998
Created April 17, 2012 02:24
install a jar file into maven local repository
mvn install:install-file -DgroupId=com.example -DartifactId=example-app -Dversion=1.0 -Dpackaging=jar -Dfile=path/to/jar/file
@duydo
duydo / gist:2403083
Created April 17, 2012 02:55
Set JAVA_HOME environment on Mac OS
export JAVA_HOME=$(/usr/libexec/java_home)
@duydo
duydo / elasticsearch_best_practices.txt
Last active December 15, 2021 06:12
Elasticsearch - Index best practices from Shay Banon
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:
- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so
@duydo
duydo / gist:2551254
Created April 29, 2012 15:23
Virtual host on your dev PC
<VirtualHost *:80>
VirtualDocumentRoot /Users/duydo/Sites/%1
ServerName automated_domains
ServerAlias *.localhost.com
</VirtualHost>
@duydo
duydo / gist:2603084
Created May 5, 2012 14:57
avoid tar's arguments list too long error
find . -name '*.ext' -print | tar -cvzf archive.tar.gz --files-from -
@duydo
duydo / gist:2780649
Created May 24, 2012 10:12
ElasticSearch - substring search
You have two options:
Use ngrams through the ngram token filter: http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenfilter.html. See Wikipedia for details: http://en.wikipedia.org/wiki/N-gram
Use the compound token filter http://www.elasticsearch.org/guide/reference/index-modules/analysis/compound-word-tokenfilter.html (which I have contributed to Lucene btw.)
The ngrams are useful if you cannot provide a dictionary because you don't know what type of documents you have to index. The downside is that they use a lot of space in your indices. So you need to be careful here.
@duydo
duydo / install_lily.sh
Created May 26, 2012 02:59
install lily
#! /bin/bash
#
#Simple script to download & install lily with its dependencies
#
LILY_VERSION='1.1.2'
HADOOP_VERSION='1.0.0'
HBASE_VERSION='0.92.1'
ZOOKEEPER_VERSION='3.4.3'
@duydo
duydo / logstash-template.json
Created September 9, 2012 03:40 — forked from deverton/logstash-template.json
Logstash Elasticsearch Template
{
"template": "logstash-*",
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 0,
"index" : {
"query" : { "default_field" : "@message" },
"store" : { "compress" : { "stored" : true, "tv": true } }
}
},
@duydo
duydo / elasticsearch.sh
Created September 15, 2012 15:25
elasticsearch script
#!/bin/bash
NAME=elasticsearch
PREFIX=/usr/local
ES_HOME=$PREFIX/$NAME
install() {
v=$1;
echo "Downloading $NAME $v...";
file="$NAME-$v.tar.gz";
@duydo
duydo / ByteTokenizer.java
Last active June 16, 2023 22:21
The byte tokenizer class allows an application to break a byte array into tokens.
/**
* @(#)ByteTokenizer.java Sep 23, 2008
* Copyright (C) 2008 Duy Do. All Rights Reserved.
*/
package com.duydo.util;
import java.util.Enumeration;
import java.util.NoSuchElementException;
/**