Skip to content

Instantly share code, notes, and snippets.

@airawat
Last active November 21, 2018 14:33
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 11 You must be signed in to fork a gist
  • Save airawat/5991070 to your computer and use it in GitHub Desktop.
Save airawat/5991070 to your computer and use it in GitHub Desktop.
Oozie workflow application with FS and email actions; Includes sample data, workflow components, commands.
This gist includes components of a simple workflow application that created a directory and moves files within
hdfs to this directory;
Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section,
to allow re-run of the action..the prepare essentially negates the move done by a potential prior run
of the action. Sample data is also included.
The sample application includes:
--------------------------------
1. Oozie actions: hdfs action and email action
2. Oozie workflow controls: start, end, and kill.
3. Workflow components: job.properties and workflow.xml
4. Sample data
5. Commands to deploy workflow, submit and run workflow
6. Oozie web console - screenshots from sample program execution
Pictorial overview of workflow:
-------------------------------
Available at:
http://hadooped.blogspot.com/2013/06/apache-oozie-part-1-workflow-with-hdfs.html
Workflow Components:
--------------------
1. job.properties
File containing:
a) parameter and value declarations that are referenced in the workflows, and
b) environment information referenced by Oozie to run the workflow including name node, job tracker, workflow application path etc
2. workflow.xml
Workflow definition file
Download location:
------------------
GitHub - https://github.com/airawat/OozieSamples
Email me at airawat.blog@gmail.com to contact me if you have access issues.
Directory structure applicable for this post/gist/blog:
-------------------------------------------------------
oozieProject
logs
airawat-syslog
<<node>>
<<year>>
<<month>>
messages
workflowHdfsAndEmailActions
job.prperties
workflow.xml
Oozie SMTP configuration
------------------------
Add the following to the oozie-site.xml, and restart oozie.
Replace values with the same specific to your environment.
<!-- SMTP params-->
<property>
<name>oozie.email.smtp.host</name>
<value>cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.port</name>
<value>25</value>
</property>
<property>
<name>oozie.email.from.address</name>
<value>oozie@cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.auth</name>
<value>false</value>
</property>
<property>
<name>oozie.email.smtp.username</name>
<value></value>
</property>
<property>
<name>oozie.email.smtp.password</name>
<value></value>
</property>
#*****************************
# job.properties
#*****************************
nameNode=hdfs://cdh-nn01.hadoop.com:8020
jobTracker=cdh-jt01:8021
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
oozie.wf.application.path=${oozieProjectRoot}/workflowHdfsAndEmailActions
dataInputDirectoryAbsPath=${oozieProjectRoot}/logs/airawat-syslog
makeDirectoryAbsPath=${oozieProjectRoot}/dataDump
dataDestinationDirectoryRelativePath=oozieProject/dataDump
emailToAddress=akhanolk@cdh-dev01
#*******End************************
Note: -The line - "oozie.wf.rerun.failnodes=true" is needed if you want to re-run; There is another config we can use instead as well that specifies which failed nodes to skip. Review Apache Oozie documentation for the same.
<!--******************************************-->
<!--workflow.xml -->
<!--******************************************-->
<workflow-app name="WorkFlowForHDFSAndEmailActions" xmlns="uri:oozie:workflow:0.1">
<start to="hdfsCommands"/>
<action name="hdfsCommands">
<fs>
<mkdir path='${makeDirectoryAbsPath}'/>
<move source='${dataInputDirectoryAbsPath}' target='${dataDestinationDirectoryRelativePath}'/>
</fs>
<ok to="sendEmailSuccess"/>
<error to="sendEmailKill"/>
</action>
<action name="sendEmailSuccess">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Status of workflow ${wf:id()}</subject>
<body>The workflow ${wf:id()} completed successfully</body>
</email>
<ok to="end"/>
<error to="end"/>
</action>
<action name="sendEmailKill">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Status of workflow ${wf:id()}</subject>
<body>The workflow ${wf:id()} had issues and was killed. The error message is: ${wf:errorMessage(wf:lastErrorNode())}</body>
</email>
<ok to="killJobFSAction"/>
<error to="killJobFSAction"/>
</action>
<kill name="killJobFSAction">
<message>"Killed job due to error in FS Action"</message>
</kill>
<end name="end"/>
</workflow-app>
Commands to load data
----------------------
a) Load data
$ hadoop fs -mkdir oozieProject
$ hadoop fs -put oozieProject/* oozieProject/
b) Validate load
$ hadoop fs -ls -R oozieProject | awk '{print $8}'
You should see...
oozieProject/logs/airawat-syslog/<<node>>/<<year>>/<<month>>/messages
oozieProject/workflowHdfsAndEmailActions/job.properties
oozieProject/workflowHdfsAndEmailActions/workflow.xml
$ hadoop fs -rm -R oozieProject/data
Oozie commands
--------------
Note: Replace oozie server and port, with your cluster-specific.
1) Submit job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -submit
job: 0000001-130712212133144-oozie-oozi-W
2) Run job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000001-130712212133144-oozie-oozi-W
3) Check the status:
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000001-130712212133144-oozie-oozi-W
4) Suspend workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000001-130712212133144-oozie-oozi-W
5) Resume workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000001-130712212133144-oozie-oozi-W
6) Re-run workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -rerun 0000001-130712212133144-oozie-oozi-W
7) Should you need to kill the job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000001-130712212133144-oozie-oozi-W
8) View server logs:
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000001-130712212133144-oozie-oozi-W
Logs are available at:
/var/log/oozie on the Oozie server.
Program output:
---------------
Expected result:
1) The data in the logs directory should be in the directory by name dataDump under oozieProject directory.
2) The directory 'logs' should be deleted.
3) An email indicating success/failure of the application
1)
$ hadoop fs -ls -R oozieProject | awk '{print $8}'
oozieProject/dataDump/airawat-syslog
oozieProject/dataDump/airawat-syslog/cdh-dev01
oozieProject/dataDump/airawat-syslog/cdh-dev01/2013
oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/04
oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/04/messages
oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/05
oozieProject/dataDump/airawat-syslog/cdh-dev01/2013/05/messages
oozieProject/dataDump/airawat-syslog/cdh-dn01
oozieProject/dataDump/airawat-syslog/cdh-dn01/2013
oozieProject/dataDump/airawat-syslog/cdh-dn01/2013/05
oozieProject/dataDump/airawat-syslog/cdh-dn01/2013/05/messages
oozieProject/dataDump/airawat-syslog/cdh-dn02
oozieProject/dataDump/airawat-syslog/cdh-dn02/2013
oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/04
oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/04/messages
oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/05
oozieProject/dataDump/airawat-syslog/cdh-dn02/2013/05/messages
oozieProject/dataDump/airawat-syslog/cdh-dn03
oozieProject/dataDump/airawat-syslog/cdh-dn03/2013
oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/04
oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/04/messages
oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/05
oozieProject/dataDump/airawat-syslog/cdh-dn03/2013/05/messages
oozieProject/dataDump/airawat-syslog/cdh-jt01
oozieProject/dataDump/airawat-syslog/cdh-jt01/2013
oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/04
oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/04/messages
oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/05
oozieProject/dataDump/airawat-syslog/cdh-jt01/2013/05/messages
oozieProject/dataDump/airawat-syslog/cdh-nn01
oozieProject/dataDump/airawat-syslog/cdh-nn01/2013
oozieProject/dataDump/airawat-syslog/cdh-nn01/2013/05
oozieProject/dataDump/airawat-syslog/cdh-nn01/2013/05/messages
oozieProject/dataDump/airawat-syslog/cdh-vms
oozieProject/dataDump/airawat-syslog/cdh-vms/2013
oozieProject/dataDump/airawat-syslog/cdh-vms/2013/05
oozieProject/dataDump/airawat-syslog/cdh-vms/2013/05/messages
oozieProject/workflowHdfsAndEmailActions/job.properties
oozieProject/workflowHdfsAndEmailActions/workflow.xml
Email from the program
-----------------------
From akhanolk@cdh-dev01.localdomain Sun Jul 14 23:08:46 2013
Return-Path: <akhanolk@cdh-dev01.localdomain>
X-Original-To: akhanolk@cdh-dev01
Delivered-To: akhanolk@cdh-dev01.localdomain
From: akhanolk@cdh-dev01.localdomain
To: akhanolk@cdh-dev01.localdomain
Subject: Status of workflow 0000006-130712212133144-oozie-oozi-W
Content-Type: text/plain; charset=us-ascii
Date: Sun, 14 Jul 2013 23:08:46 -0500 (CDT)
Status: R
The workflow 0000006-130712212133144-oozie-oozi-W completed successfully
Screenshots of the Oozie web console are available at:
------------------------------------------------------
http://hadooped.blogspot.com/2013/06/apache-oozie-part-1-workflow-with-hdfs.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment