Janos Matyas matyix

## yarn-capacity-scheduler
14/03/12 11:25:19 INFO monitoring.MRJobStatus:
Job PiEstimation into: highPriorityin queue (default) progress M/R: 1.0/1.0
14/03/12 11:25:19 INFO monitoring.MRJobStatus: Tracking URL : sandbox.hortonworks.com:19888/jobhistory/job/job_1394640227686_0004
14/03/12 11:25:19 INFO monitoring.MRJobStatus: Reserved memory : 0, used memory : 0 and used slots : 0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000001, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000007, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000000, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000005, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000009, and progress 1.0
14/03/12

## setjdk
function setjdk() {
  if [ $# -ne 0 ]; then
   removeFromPath '/System/Library/Frameworks/JavaVM.framework/Home/bin'
   if [ -n "${JAVA_HOME+x}" ]; then
    removeFromPath $JAVA_HOME
   fi
   export JAVA_HOME=`/usr/libexec/java_home -v $@`
   export PATH=$JAVA_HOME/bin:$PATH
  fi
 }

## RemoteBlockReader2-exception
ERROR SimpleFeatureSelector:67 - Exception {}
java.lang.IllegalStateException: Must not use direct buffers with InputStream API
	at com.google.common.base.Preconditions.checkState(Preconditions.java:176)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:211)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:170)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:135)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:642)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:698)

## mvn-to-gradle
gradle init --type pom

## gist:9188384
If you are a HBase user a good experience to leave the comfort zone of the JVM heap is to start playing with the new BlockCache implementations in latest HBase versions (made me a sleepless night).

Older HBase implementations supported a single BlockCache implementation **(LruBlockCache)** but further version have introduced different cache implementations *(a block cache is a single unit of I/O, the smallest amount of data HBase writes or reads back from an HFile)*

Since the introduction of **ByteBuffers** in Java we had the chance to leverage the operating system's memory manager to move data in and out of memory in a way that's transparent to your program - and this is what exactly the new **SlabCache** and **BucketCache** implementations does, allocates memory outside the JVM heap using direct byte buffers.

``` java java.nio.ByteBuffer class
public static ByteBuffer allocateDirect(int capacity)
```

## helix-distributed locks
Recently I was working on a Hadoop 2.x/YARN based Application Master - and came across Apache Helix (a generic cluster management framework by LinkedIn).
Usually YARN gives you the framework to allocate containers among nodes based on utilization (CPU, memory), monitor, start and restart containers - and leaves the state management, fault tolerance, cluster expansion, throttling and replication and partitioning to be left at the discretion of the Application Master. This is where Helix is complementary to YARN, and does this a declarative way (by using a Finite-State Machine http://en.wikipedia.org/wiki/Finite-state_machine).

I will let you know how this 'mini-project' goes, but back to the topic ...
I had a conversation with a good friend about how they managed to do a distributed system using Apache Zookeeper and how they struggled to model multiple distributed locks at scale. While there might be a few frameworks achieving this, I was thinking to highlight the differences and the advantages of using Hel
	14/03/12 11:25:19 INFO monitoring.MRJobStatus:
	Job PiEstimation into: highPriorityin queue (default) progress M/R: 1.0/1.0
	14/03/12 11:25:19 INFO monitoring.MRJobStatus: Tracking URL : sandbox.hortonworks.com:19888/jobhistory/job/job_1394640227686_0004
	14/03/12 11:25:19 INFO monitoring.MRJobStatus: Reserved memory : 0, used memory : 0 and used slots : 0
	14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000001, and progress 1.0
	14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000007, and progress 1.0
	14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000000, and progress 1.0
	14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000005, and progress 1.0
	14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000009, and progress 1.0
	14/03/12
	function setjdk() {
	if [ $# -ne 0 ]; then
	removeFromPath '/System/Library/Frameworks/JavaVM.framework/Home/bin'
	if [ -n "${JAVA_HOME+x}" ]; then
	removeFromPath $JAVA_HOME
	fi
	export JAVA_HOME=`/usr/libexec/java_home -v $@`
	export PATH=$JAVA_HOME/bin:$PATH
	fi
	}
	ERROR SimpleFeatureSelector:67 - Exception {}
	java.lang.IllegalStateException: Must not use direct buffers with InputStream API
	at com.google.common.base.Preconditions.checkState(Preconditions.java:176)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:211)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
	at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:170)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:135)
	at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:642)
	at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:698)
	If you are a HBase user a good experience to leave the comfort zone of the JVM heap is to start playing with the new BlockCache implementations in latest HBase versions (made me a sleepless night).

	Older HBase implementations supported a single BlockCache implementation (LruBlockCache) but further version have introduced different cache implementations (a block cache is a single unit of I/O, the smallest amount of data HBase writes or reads back from an HFile)

	Since the introduction of ByteBuffers in Java we had the chance to leverage the operating system's memory manager to move data in and out of memory in a way that's transparent to your program - and this is what exactly the new SlabCache and BucketCache implementations does, allocates memory outside the JVM heap using direct byte buffers.

	``` java java.nio.ByteBuffer class
	public static ByteBuffer allocateDirect(int capacity)
	```
	Recently I was working on a Hadoop 2.x/YARN based Application Master - and came across Apache Helix (a generic cluster management framework by LinkedIn).
	Usually YARN gives you the framework to allocate containers among nodes based on utilization (CPU, memory), monitor, start and restart containers - and leaves the state management, fault tolerance, cluster expansion, throttling and replication and partitioning to be left at the discretion of the Application Master. This is where Helix is complementary to YARN, and does this a declarative way (by using a Finite-State Machine http://en.wikipedia.org/wiki/Finite-state_machine).

	I will let you know how this 'mini-project' goes, but back to the topic ...
	I had a conversation with a good friend about how they managed to do a distributed system using Apache Zookeeper and how they struggled to model multiple distributed locks at scale. While there might be a few frameworks achieving this, I was thinking to highlight the differences and the advantages of using Hel