Skip to content

Instantly share code, notes, and snippets.


Keybase proof

I hereby claim:

  • I am oza on github.
  • I am ozw ( on keybase.
  • I have a public key ASBEgZ5dBs8y7vz5MdexvhUgBH9vijrAse0P2YodzXZxxQo

To claim this, I am signing this object:

oza / hadoop-shaded-thirdparty
Created May 5, 2017
content of hadoop-shaded-thirdparty.jar
View hadoop-shaded-thirdparty
$ unzip hadoop-shaded-thirdparty-3.0.0-alpha3-SNAPSHOT.jar
Archive: hadoop-shaded-thirdparty-3.0.0-alpha3-SNAPSHOT.jar
creating: META-INF/
inflating: META-INF/LICENSE.txt
creating: META-INF/maven/
inflating: META-INF/maven/remote-resources.xml
creating: META-INF/maven/org.apache.hadoop/
creating: META-INF/maven/org.apache.hadoop/hadoop-shaded-thirdparty/
inflating: META-INF/maven/org.apache.hadoop/hadoop-shaded-thirdparty/pom.xml
oza /
Last active Apr 19, 2017
how to replace imports of Guava
find . -name "*.java" | xargs sed -i -e "s/import\ com\.google\.common\./import"
find . -name "*.java" | xargs sed -i -e "s/import\ static\ com\.google\.common\./import static"

git diff --ignore-space-change > 1.patch
View gist:39e37a20f5af22b655b6

JDK and Hadoop

  • 2 level of support
    • Runtime-level support
    • Source-level support

Current status

Runtime Support

oza /
Last active Oct 18, 2017
Running Kudu with MapReduce framework (Lightning talk in Cloudera World Tokyo)


What's Kudu?

  • From
    • Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
    • Distributed Insertable/Updatable columnar store.
    • Schema on write.
View java-map-perf-result.log
11.44% [.] crc32
4.33% [kernel] [k] isolate_freepages_block
2.04% [kernel] [k] copy_user_enhanced_fast_string
1.55% [kernel] [k] _raw_spin_unlock_irqrestore
1.27% [.] SpinPause
1.18% [.] __memcpy_sse2_unaligned
0.85% [kernel] [k] __reset_isolation_suitable
0.78% [kernel] [k] get_page_from_freelist
0.71% [kernel] [k] clear_page_c_e
0.61% [kernel] [k] compaction_alloc
View mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
oza /
Last active Nov 19, 2019
How to run Spark on YARN with dynamic resource allocation


  1. General resource management layer on HDFS
  2. A part of Hadoop


  1. In memory processing framework

Spark on YARN

View tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software