Skip to content

Instantly share code, notes, and snippets.

View keybase.md

Keybase proof

I hereby claim:

  • I am oza on github.
  • I am ozw (https://keybase.io/ozw) on keybase.
  • I have a public key ASBEgZ5dBs8y7vz5MdexvhUgBH9vijrAse0P2YodzXZxxQo

To claim this, I am signing this object:

@oza
oza / hadoop-shaded-thirdparty
Created May 5, 2017
content of hadoop-shaded-thirdparty.jar
View hadoop-shaded-thirdparty
$ unzip hadoop-shaded-thirdparty-3.0.0-alpha3-SNAPSHOT.jar
Archive: hadoop-shaded-thirdparty-3.0.0-alpha3-SNAPSHOT.jar
creating: META-INF/
inflating: META-INF/MANIFEST.MF
inflating: META-INF/LICENSE.txt
creating: META-INF/maven/
inflating: META-INF/maven/remote-resources.xml
creating: META-INF/maven/org.apache.hadoop/
creating: META-INF/maven/org.apache.hadoop/hadoop-shaded-thirdparty/
inflating: META-INF/maven/org.apache.hadoop/hadoop-shaded-thirdparty/pom.xml
@oza
oza / HADOOP-14284.md
Last active Apr 19, 2017
how to replace imports of Guava
View HADOOP-14284.md
find . -name "*.java" | xargs sed -i -e "s/import\ com\.google\.common\./import org.apache.hadoop.shaded.com.google.common./"
find . -name "*.java" | xargs sed -i -e "s/import\ static\ com\.google\.common\./import static org.apache.hadoop.shaded.com.google.common./"

git diff --ignore-space-change > 1.patch
View gist:39e37a20f5af22b655b6
<configuration>
<property>
<name>tez.am.am-rm.heartbeat.interval-ms.max</name>
<value>250</value>
</property>
<property>
<name>tez.am.container.idle.release-timeout-max.millis</name>
<value>20000</value>
View JDK.md

JDK and Hadoop

  • 2 level of support
    • Runtime-level support
    • Source-level support

Current status

Runtime Support

@oza
oza / LT1.md
Last active Oct 18, 2017
Running Kudu with MapReduce framework (Lightning talk in Cloudera World Tokyo)
View LT1.md

Kudu

What's Kudu?

  • From http://getkudu.io/
    • Kudu completes Hadoop's storage layer to enable fast analytics on fast data.
    • Distributed Insertable/Updatable columnar store.
    • Schema on write.
View java-map-perf-result.log
11.44% libz.so.1.2.8 [.] crc32
4.33% [kernel] [k] isolate_freepages_block
2.04% [kernel] [k] copy_user_enhanced_fast_string
1.55% [kernel] [k] _raw_spin_unlock_irqrestore
1.27% libjvm.so [.] SpinPause
1.18% libc-2.19.so [.] __memcpy_sse2_unaligned
0.85% [kernel] [k] __reset_isolation_suitable
0.78% [kernel] [k] get_page_from_freelist
0.71% [kernel] [k] clear_page_c_e
0.61% [kernel] [k] compaction_alloc
View mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
@oza
oza / SparkOnYARN.md
Last active Nov 19, 2019
How to run Spark on YARN with dynamic resource allocation
View SparkOnYARN.md

YARN

  1. General resource management layer on HDFS
  2. A part of Hadoop

Spark

  1. In memory processing framework

Spark on YARN

View tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
You can’t perform that action at this time.