Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save zduymz/65097c288fbb09dd86ae9a4381c11021 to your computer and use it in GitHub Desktop.
Save zduymz/65097c288fbb09dd86ae9a4381c11021 to your computer and use it in GitHub Desktop.

source: openkb.info

1. Oozie Launcher Job architecture

Oozie Launcher job is a map-only job which will start the jobs which does the real work: eg, Hive, MR, Pig, etc.

(Oozie Launcher MR job [AM/Mapper Container(Hive CLI)])
. -> (MR job-1 spawned by Hive query(stage0) [AM/Mapper/Reducer Containter])
. -> (MR job-2 spawned by Hive query(stage1) [AM/Mapper/Reducer Containter])

2. How to increase the YARN container size for AM or Mapper of Oozie Hive job?

It is controlled by below 4 parameters set in workflow.xml for each Oozie job.

oozie.launcher.mapreduce.map.memory.mb
oozie.launcher.mapreduce.map.java.opts
oozie.launcher.yarn.app.mapreduce.am.resource.mb
oozie.launcher.mapreduce.map.java.opts

The algorithm is in Oozie source code:
core/src/main/java/org/apache/oozie/action/hadoop/JavaActionExecutor.java

// memory.mb
int launcherMapMemoryMB = launcherConf.getInt(HADOOP_MAP_MEMORY_MB, 1536);
int amMemoryMB = launcherConf.getInt(YARN_AM_RESOURCE_MB, 1536);
// YARN_MEMORY_MB_MIN to provide buffer.
// suppose launcher map aggressively use high memory, need some
// headroom for AM
int memoryMB = Math.max(launcherMapMemoryMB, amMemoryMB) + YARN_MEMORY_MB_MIN;
// limit to 4096 in case of 32 bit
if (launcherMapMemoryMB < 4096 && amMemoryMB < 4096 && memoryMB > 4096) {
    memoryMB = 4096;
}
launcherConf.setInt(YARN_AM_RESOURCE_MB, memoryMB);
 
// We already made mapred.child.java.opts and
// mapreduce.map.java.opts equal, so just start with one of them
String launcherMapOpts = launcherConf.get(HADOOP_MAP_JAVA_OPTS, "");
String amChildOpts = launcherConf.get(YARN_AM_COMMAND_OPTS);
StringBuilder optsStr = new StringBuilder();
int heapSizeForMap = extractHeapSizeMB(launcherMapOpts);
int heapSizeForAm = extractHeapSizeMB(amChildOpts);
int heapSize = Math.max(heapSizeForMap, heapSizeForAm) + YARN_MEMORY_MB_MIN;
// limit to 3584 in case of 32 bit
if (heapSizeForMap < 4096 && heapSizeForAm < 4096 && heapSize > 3584) {
    heapSize = 3584;
}
if (amChildOpts != null) {
    optsStr.append(amChildOpts);
}
optsStr.append(" ").append(launcherMapOpts.trim());
if (heapSize > 0) {
    // append calculated total heap size to the end
    optsStr.append(" ").append("-Xmx").append(heapSize).append("m");
}
launcherConf.set(YARN_AM_COMMAND_OPTS, optsStr.toString().trim());

In above code, YARN_MEMORY_MB_MIN=512.
For memory.mb:
max(oozie.launcher.mapreduce.map.memory.mb,oozie.launcher.yarn.app.mapreduce.am.resource.mb)+512
For JAVA OPT:
max(oozie.launcher.mapreduce.map.java.optsb,oozie.launcher.mapreduce.map.java.opts)+512

Examples:

  1. Set below in workflow.xml:
<property>
    <name>oozie.launcher.mapreduce.map.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>oozie.launcher.mapreduce.map.java.opts</name>
    <value>-Xmx777m</value>
</property>
 
 <property>
    <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
    <value>2048</value>
</property>
<property>
    <name>oozie.launcher.mapreduce.map.java.opts</name>
    <value>-Xmx1111m</value>
</property>

The actual container size for Oozie Launcher job is: (3072mb,-Xmx1623m).
The memory.mb=3072 because max(1024,2048)+512=2560 ==> 3072 because of yarn.scheduler.minimum-allocation-mb=1024.
2. Set below in workflow.xml:

<property>
    <name>oozie.launcher.mapreduce.map.memory.mb</name>
    <value>3072</value>
</property>
<property>
    <name>oozie.launcher.mapreduce.map.java.opts</name>
    <value>-Xmx777m</value>
</property>
 
 <property>
    <name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
    <value>2048</value>
</property>
<property>
    <name>oozie.launcher.mapreduce.map.java.opts</name>
    <value>-Xmx1111m</value>
</property>

The actual container size for Oozie Launcher job is: (4098mb,-Xmx1623m).

3. How to verify the Oozie Launcher Container Size?

Do not blindly trust the configuration page because there could be multiple sources to control the same thing. Take above example #2 for example:
To check actual memory.mb, start with RM log:

Assigned container container_e04_1468279966583_0020_01_000001 of capacity <memory:4096, vCores:1, disks:0.0>

To check the actual java opts, do "ps -ef" on the NM when the Oozie Launcher job is running:

v7: mapr     18959 18948 99 19:36 ?        00:00:04 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-1.b14.el6.x86_64/jre/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs/userlogs/application_1468279966583_0020/container_e04_1468279966583_0020_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Xmx1024m -Xmx200m -Xmx1111m -Xmx1623m -Djava.io.tmpdir=./tmp org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Key Takeaways:

  1. When Oozie Job runs "OutOfMemory", figure out is it Oozie Launcher Job, or the MR job spawned by Hadoop components.
  2. Knows how to verify the memory.mb and JAVA opts for Oozie Launcher job during runtime.
@wangxujin1221
Copy link

Actually it works for me after remove the words oozie.launcher, I mean , for example, i use 'mapreduce.map.java.opts' instead of oozie.launcher.mapreduce.map.java.opts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment