Skip to content

Instantly share code, notes, and snippets.

@airawat
Last active March 18, 2021 08:34
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 11 You must be signed in to fork a gist
  • Save airawat/6533455 to your computer and use it in GitHub Desktop.
Save airawat/6533455 to your computer and use it in GitHub Desktop.
Oozie workflow with a shell action - with CaptureOutput Counts lines in a glob provided and writes the same to standard output. A subsequent email action emails the output of the shell action
This gist includes components of a oozie workflow - scripts/code, sample data
and commands; Oozie actions covered: shell action, email action
Action 1: The shell action executes a shell script that does a line count for files in a
glob provided, and writes the line count to standard output
Action 2: The email action emails the output of action 1
Pictorial overview of job:
--------------------------
<<To be added>>
Includes:
---------
Data and script download: 01-DataAndScriptDownload
Data load commands: 02-HdfsLoadCommands
Shell Script: 03-ShellScript
Oozie job properties file: 04-OozieJobProperties
Oozie workflow file: 05-OozieWorkflowXML
Oozie SMTP Configuration: 06-OozieSMTPConfig
Oozie commands 07-OozieJobExecutionCommands
Output email 08-OutputOfProgram
Oozie web console - screenshots 09-OozieWebConsoleScreenshots
01. Data and script download
-----------------------------
Github:
https://github.com/airawat/OozieSamples
Email me at airawat.blog@gmail.com if you encounter any issues
Directory structure
-------------------
oozieProject
data
airawat-syslog
<<Node-Name>>
<<Year>>
<<Month>>
messages
workflowShellAction
workflow.xml
job.properties
lineCount.sh
02-Hdfs load commands
----------------------
$ hadoop fs -mkdir oozieProject
$ hadoop fs -put oozieProject/* oozieProject/
#*************************************************
#lineCount.sh
#*************************************************
#!/bin/bash -e
echo "NumberOfLines=`hadoop fs -cat $1 | wc -l`"
#*************************************************
# job.properties
#*************************************************
nameNode=hdfs://cdh-nn01.chuntikhadoop.com:8020
jobTracker=cdh-jt01:8021
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
appPath=${oozieProjectRoot}/workflowShellAction
oozie.wf.application.path=${appPath}
inputDir=${oozieProjectRoot}/data/*/*/*/*/*
lineCountShScriptPath=${appPath}/lineCount.sh
lineCountShellScript=lineCount.sh
emailToAddress=akhanolk@cdh-dev01
<!--******************************************-->
<!--workflow.xml -->
<!--******************************************-->
<workflow-app name="WorkFlowForShellActionWithCaptureOutput" xmlns="uri:oozie:workflow:0.1">
<start to="shellAction"/>
<action name="shellAction">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${lineCountShellScript}</exec>
<argument>${inputDir}</argument>
<file>${lineCountShScriptPath}#${lineCountShellScript}</file>
<capture-output/>
</shell>
<ok to="sendEmail"/>
<error to="killAction"/>
</action>
<action name="sendEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Output of workflow ${wf:id()}</subject>
<body>Results from line count: ${wf:actionData('shellAction')['NumberOfLines']}</body>
</email>
<ok to="end"/>
<error to="end"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
Oozie SMTP configuration
------------------------
Add the following to the oozie-site.xml, and restart oozie.
Replace values with the same specific to your environment.
<!-- SMTP params-->
<property>
<name>oozie.email.smtp.host</name>
<value>cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.port</name>
<value>25</value>
</property>
<property>
<name>oozie.email.from.address</name>
<value>oozie@cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.auth</name>
<value>false</value>
</property>
<property>
<name>oozie.email.smtp.username</name>
<value></value>
</property>
<property>
<name>oozie.email.smtp.password</name>
<value></value>
</property>
06. Oozie commands
-------------------
Note: Replace oozie server and port, with your cluster-specific.
1) Submit job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowShellAction/job.properties -submit
job: 0000012-130712212133144-oozie-oozi-W
2) Run job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000014-130712212133144-oozie-oozi-W
3) Check the status:
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000014-130712212133144-oozie-oozi-W
4) Suspend workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000014-130712212133144-oozie-oozi-W
5) Resume workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000014-130712212133144-oozie-oozi-W
6) Re-run workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowShellAction/job.properties -rerun 0000014-130712212133144-oozie-oozi-W
7) Should you need to kill the job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000014-130712212133144-oozie-oozi-W
8) View server logs:
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000014-130712212133144-oozie-oozi-W
Logs are available at:
/var/log/oozie on the Oozie server.
########################
#Program output
########################
From akhanolk@cdh-dev01.localdomain Thu Sep 12 00:51:00 2013
Return-Path: <akhanolk@cdh-dev01.localdomain>
X-Original-To: akhanolk@cdh-dev01
Delivered-To: akhanolk@cdh-dev01.localdomain
From: akhanolk@cdh-dev01.localdomain
To: akhanolk@cdh-dev01.localdomain
Subject: Output of workflow 0000009-130911235633916-oozie-oozi-W
Content-Type: text/plain; charset=us-ascii
Date: Thu, 12 Sep 2013 00:51:00 -0500 (CDT)
Status: R
Results from line count: 5207
@dpraous
Copy link

dpraous commented Feb 28, 2015

Hi,
Thanks for the nice write up. I tried to run shell script using oozie workflow and was not successful.
first things it says cannot run that shell script as access denied.
I had to try this because of sqoop export is not working properly thru oozie.
I wanted to run sqoop export command thru shell script because of that.

please help me in this..

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment