Skip to content

Instantly share code, notes, and snippets.

View matyix's full-sized avatar
🏄‍♂️
Banzai Pipeline

Janos Matyas matyix

🏄‍♂️
Banzai Pipeline
View GitHub Profile
@matyix
matyix / yarn-capacity-scheduler
Created March 13, 2014 13:13
yarn-capacity-scheduler log
14/03/12 11:25:19 INFO monitoring.MRJobStatus:
Job PiEstimation into: highPriorityin queue (default) progress M/R: 1.0/1.0
14/03/12 11:25:19 INFO monitoring.MRJobStatus: Tracking URL : sandbox.hortonworks.com:19888/jobhistory/job/job_1394640227686_0004
14/03/12 11:25:19 INFO monitoring.MRJobStatus: Reserved memory : 0, used memory : 0 and used slots : 0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000001, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000007, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000000, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000005, and progress 1.0
14/03/12 11:25:20 INFO monitoring.MRJobStatus: MAP: Status COMPLETE with task ID task_1394640227686_0004_m_000009, and progress 1.0
14/03/12
@matyix
matyix / setjdk
Created March 11, 2014 15:34
Change between JDK versions
function setjdk() {
if [ $# -ne 0 ]; then
removeFromPath '/System/Library/Frameworks/JavaVM.framework/Home/bin'
if [ -n "${JAVA_HOME+x}" ]; then
removeFromPath $JAVA_HOME
fi
export JAVA_HOME=`/usr/libexec/java_home -v $@`
export PATH=$JAVA_HOME/bin:$PATH
fi
}
@matyix
matyix / RemoteBlockReader2-exception
Created March 6, 2014 10:30
Must not use direct buffers with InputStream API
ERROR SimpleFeatureSelector:67 - Exception {}
java.lang.IllegalStateException: Must not use direct buffers with InputStream API
at com.google.common.base.Preconditions.checkState(Preconditions.java:176)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:211)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:170)
at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:135)
at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:642)
at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:698)
@matyix
matyix / mvn-to-gradle
Created February 28, 2014 20:22
mvn to gradle
gradle init --type pom
@matyix
matyix / gist:9188384
Created February 24, 2014 13:26
Leaving the comfort zone of the JVM heap with HBase
If you are a HBase user a good experience to leave the comfort zone of the JVM heap is to start playing with the new BlockCache implementations in latest HBase versions (made me a sleepless night).
Older HBase implementations supported a single BlockCache implementation **(LruBlockCache)** but further version have introduced different cache implementations *(a block cache is a single unit of I/O, the smallest amount of data HBase writes or reads back from an HFile)*
Since the introduction of **ByteBuffers** in Java we had the chance to leverage the operating system's memory manager to move data in and out of memory in a way that's transparent to your program - and this is what exactly the new **SlabCache** and **BucketCache** implementations does, allocates memory outside the JVM heap using direct byte buffers.
``` java java.nio.ByteBuffer class
public static ByteBuffer allocateDirect(int capacity)
```
Recently I was working on a Hadoop 2.x/YARN based Application Master - and came across Apache Helix (a generic cluster management framework by LinkedIn).
Usually YARN gives you the framework to allocate containers among nodes based on utilization (CPU, memory), monitor, start and restart containers - and leaves the state management, fault tolerance, cluster expansion, throttling and replication and partitioning to be left at the discretion of the Application Master. This is where Helix is complementary to YARN, and does this a declarative way (by using a Finite-State Machine http://en.wikipedia.org/wiki/Finite-state_machine).
I will let you know how this 'mini-project' goes, but back to the topic ...
I had a conversation with a good friend about how they managed to do a distributed system using Apache Zookeeper and how they struggled to model multiple distributed locks at scale. While there might be a few frameworks achieving this, I was thinking to highlight the differences and the advantages of using Hel