Jesse Davis jessedavis

## gist:b545b8edbe72af6c4099
Recipe:

directory '/venv' do
    owner 'root'
    group 'root'
    mode '0777'
    action :create
end

python_virtualenv '/venv' do

## gist:10313884
We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

We create one index per day, and will keep the past 90 days around for searching.  We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days.  This ended up being approximately 313 million documents.  I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
I then started changing the number of replicas per index to 1, one index at a time.  I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far.  In particular, we wo

## gist:10286925
[2014-04-09 14:17:28,393][WARN ][cluster.action.shard     ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
FSLock@/ebsmnt

## gist:5077890
Using rbenv, library versions are:
Fog 1.9.0
Ruby 1.9.3-p125
rspec 2.13.0
rake 10.0.3

To replicate:

describe Fog do
  stack_name = 'stack-test'
	Recipe:

	directory '/venv' do
	owner 'root'
	group 'root'
	mode '0777'
	action :create
	end

	python_virtualenv '/venv' do
	We're attempting to create a new Elasticsearch cluster for indexing URLs, but have run into a memory leak when turning replication on for our indices.

	The current setup is: 5 x m2.2xlarge, 4 TB mounted on EBS per node (not Provisioned IOPs).

	We create one index per day, and will keep the past 90 days around for searching. We have been been performing bulk inserts with routing enabled, 1 day at a time, and have been successful in loading all 90 days. This ended up being approximately 313 million documents. I had inserted with the number of replicas per index set to 0 to increase our bulk insertion rate.
	I then started changing the number of replicas per index to 1, one index at a time. I was able to successfully create the replicas for about 70 of the shards (i.e. about 65 or 70 days), but then ran out of heap space.

	We are planning to bulk insert about 2-4 millions records per day in 10 minute intervals, so I would appreciate any advice on the validity of our configuration so far. In particular, we wo
	[2014-04-09 14:17:28,393][WARN ][cluster.action.shard ] [esearch16] [.marvel-2014.04.09][0] received shard failed for [.marvel-2014.04.09][0], node[SK5okikgSWSdbrQdZWET8g], [R], s[INITIALIZING], in
	dexUUID [K4IB1Px3RoOqcjPbta-fKw], reason [Failed to start shard, message [RecoveryFailedException[[.marvel-2014.04.09][0]: Recovery failed from [esearch16][EbYQ9HNzQtexkEZ1PgwpnQ][esearch16.tlys.us][in
	et[/10.145.167.184:9300]] into [esearch13][SK5okikgSWSdbrQdZWET8g][esearch13.tlys.us][inet[ip-10-185-195-69.ec2.internal/10.185.195.69:9300]]]; nested: RemoteTransportException[[esearch16][inet[/10.145
	.167.184:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[.marvel-2014.04.09][0] Phase[2] Execution failed]; nested: RemoteTransportException[[esearch13][inet[/10.185.195.6
	9:9300]][index/shard/recovery/prepareTranslog]]; nested: EngineCreationFailureException[[.marvel-2014.04.09][0] failed to create engine]; nested: LockObtainFailedException[Lock obtain timed out: Native
	FSLock@/ebsmnt
	Using rbenv, library versions are:
	Fog 1.9.0
	Ruby 1.9.3-p125
	rspec 2.13.0
	rake 10.0.3

	To replicate:

	describe Fog do
	stack_name = 'stack-test'