Duy Do duydo

## gist:2402998
mvn install:install-file -DgroupId=com.example -DartifactId=example-app -Dversion=1.0 -Dpackaging=jar -Dfile=path/to/jar/file

## gist:2403083

export JAVA_HOME=$(/usr/libexec/java_home)

## elasticsearch_best_practices.txt
If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

- Use create in the index API (assuming you can).
- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
- Increase the memory allocated to elasticsearch node. By default its 1g.
- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
- Increase the number of machines you have so

## gist:2551254
<VirtualHost *:80>
    VirtualDocumentRoot /Users/duydo/Sites/%1
    ServerName automated_domains
    ServerAlias *.localhost.com
</VirtualHost>

## gist:2603084
find . -name '*.ext' -print | tar -cvzf archive.tar.gz --files-from -

## gist:2780649
You have two options:

    Use ngrams through the ngram token filter: http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenfilter.html. See Wikipedia for details: http://en.wikipedia.org/wiki/N-gram
    Use the compound token filter http://www.elasticsearch.org/guide/reference/index-modules/analysis/compound-word-tokenfilter.html (which I have contributed to Lucene btw.)

The ngrams are useful if you cannot provide a dictionary because you don't know what type of documents you have to index. The downside is that they use a lot of space in your indices. So you need to be careful here.

## install_lily.sh
#! /bin/bash

#
#Simple script to download & install lily with its dependencies
#

LILY_VERSION='1.1.2'
HADOOP_VERSION='1.0.0'
HBASE_VERSION='0.92.1'
ZOOKEEPER_VERSION='3.4.3'

## logstash-template.json
{
    "template": "logstash-*",
    "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 0,
        "index" : {
            "query" : { "default_field" : "@message" },
            "store" : { "compress" : { "stored" : true, "tv": true } }
        }
    },

## elasticsearch.sh
#!/bin/bash

NAME=elasticsearch
PREFIX=/usr/local
ES_HOME=$PREFIX/$NAME

install() {
	v=$1;
	echo "Downloading $NAME $v...";
	file="$NAME-$v.tar.gz";

## ByteTokenizer.java
/**
 * @(#)ByteTokenizer.java Sep 23, 2008
 * Copyright (C) 2008 Duy Do. All Rights Reserved.
 */
package com.duydo.util;

import java.util.Enumeration;
import java.util.NoSuchElementException;

/**
	If you want, I can try and help with pointers as to how to improve the indexing speed you get. Its quite easy to really increase it by using some simple guidelines, for example:

	- Use create in the index API (assuming you can).
	- Relax the real time aspect from 1 second to something a bit higher (index.engine.robin.refresh_interval).
	- Increase the indexing buffer size (indices.memory.index_buffer_size), it defaults to the value 10% which is 10% of the heap.
	- Increase the number of dirty operations that trigger automatic flush (so the translog won't get really big, even though its FS based) by setting index.translog.flush_threshold (defaults to 5000).
	- Increase the memory allocated to elasticsearch node. By default its 1g.
	- Start with a lower replica count (even 0), and then once the bulk loading is done, increate it to the value you want it to be using the update_settings API. This will improve things as possibly less shards will be allocated to each machine.
	- Increase the number of machines you have so
	<VirtualHost *:80>
	VirtualDocumentRoot /Users/duydo/Sites/%1
	ServerName automated_domains
	ServerAlias *.localhost.com
	</VirtualHost>
	You have two options:

	Use ngrams through the ngram token filter: http://www.elasticsearch.org/guide/reference/index-modules/analysis/ngram-tokenfilter.html. See Wikipedia for details: http://en.wikipedia.org/wiki/N-gram
	Use the compound token filter http://www.elasticsearch.org/guide/reference/index-modules/analysis/compound-word-tokenfilter.html (which I have contributed to Lucene btw.)

	The ngrams are useful if you cannot provide a dictionary because you don't know what type of documents you have to index. The downside is that they use a lot of space in your indices. So you need to be careful here.
	#! /bin/bash

	#
	#Simple script to download & install lily with its dependencies
	#

	LILY_VERSION='1.1.2'
	HADOOP_VERSION='1.0.0'
	HBASE_VERSION='0.92.1'
	ZOOKEEPER_VERSION='3.4.3'
	{
	"template": "logstash-*",
	"settings" : {
	"number_of_shards" : 1,
	"number_of_replicas" : 0,
	"index" : {
	"query" : { "default_field" : "@message" },
	"store" : { "compress" : { "stored" : true, "tv": true } }
	}
	},
	#!/bin/bash

	NAME=elasticsearch
	PREFIX=/usr/local
	ES_HOME=$PREFIX/$NAME

	install() {
	v=$1;
	echo "Downloading $NAME $v...";
	file="$NAME-$v.tar.gz";
	/**
	* @(#)ByteTokenizer.java Sep 23, 2008
	* Copyright (C) 2008 Duy Do. All Rights Reserved.
	*/
	package com.duydo.util;

	import java.util.Enumeration;
	import java.util.NoSuchElementException;

	/**