Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Oozie SSH action Sample Oozie workflow that demonstrates the SSH action to move files from a specific node to HDFS
This gist covers the Oozie SSH action.
It includes components of a sample Oozie workflow application- scripts/code,
sample data and commands; Oozie actions covered: secure shell action, email
action.
My blog has documentation, and highlights of a very basic sample program.
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html
This gist includes:
--------------------
Data and script download
Data load commands
Shell script
Oozie job properties file
Oozie workflow file
Oozie SMTP configuration
Oozie commands
Output in HDFS
Output email
Oozie web console - screenshots
My other blogs on Oozie:
------------------------
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
************************************
*Data and code/application download
************************************
Data and code:
--------------
Github:
https://github.com/airawat/OozieSamples
Email me at airawat.blog@gmail.com if you encounter any issues
Directory structure of application download
--------------------------------------------
oozieProject
workflowSshAction
job.properties
workflow.xml
scripts
uploadFile.sh
data
employees_data
************************************
*Intent of sample application
************************************
The application includes an Oozie worklow that runs an operation on a specific node
leveraging Oozie ssh action.
Specifically:
It SSHs to a node specified in the workflow (cdh-dn01), and executes the command
specified in the workflow (uploadFile.sh). The command specified is a shell script
local to the remote node - and it essentially copies a local file to HDFS.
The ssh action accepts arguments. For simplicity, the same have not been included
in the sample program.
A pictorial representation of the workflow is at:
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html
#*************************************************
# job.properties
#*************************************************
nameNode=hdfs://cdh-nn01.chuntikhadoop.com:8020
jobTracker=cdh-jt01:8021
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
appPath=${oozieProjectRoot}/workflowSshAction
oozie.wf.application.path=${appPath}
inputDir=${oozieProjectRoot}/data
focusNodeLogin=akhanolk@cdh-dn01
shellScriptPath=~/scripts/uploadFile.sh
emailToAddress=akhanolk@cdh-dev01
<!--******************************************-->
<!--workflow.xml -->
<!--******************************************-->
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="sshAction"/>
<action name="sshAction">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${focusNodeLogin}</host>
<command>${shellScriptPath}</command>
<capture-output/>
</ssh>
<ok to="sendEmail"/>
<error to="killAction"/>
</action>
<action name="sendEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Output of workflow ${wf:id()}</subject>
<body>Status of the file move: ${wf:actionData('sshAction')['STATUS']}</body>
</email>
<ok to="end"/>
<error to="end"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
#################################
# Name: uploadFile.sh
# Location: remote node where we
# want to run an
# operation
#################################
#!/bin/bash
hadoop fs -rm -R oozieProject/results-sshAction/*
hadoop fs -put ~/data/* oozieProject/results-sshAction/
status=$?
if [ $status = 0 ]; then
echo "STATUS=SUCCESS"
else
echo "STATUS=FAIL"
fi
*****************************************
Location of files/scripts & commands
*****************************************
I have pasted information specific to my environment; Modify as required.
1) Node (cdh-dev01) where the Oozie CLI will be used to submit/run Oozie workflow:
Structure/Path:
~/oozieProject/workflowSshAction/job.properties
2) HDFS:
Workflow directory structure:
/user/akhanolk/oozieProject/workflowSshAction/workflow.xml
Commands to load:
hadoop fs -mkdir oozieProject
hadoop fs -mkdir oozieProject/workflowSshAction
hadoop fs -put ~/oozieProject/workflowSshAction/workflow.xml oozieProject/workflowSshAction
Output directory structure:
/user/akhanolk/oozieProject/results-sshAction
Command:
hadoop fs -mkdir oozieProject/results-sshAction
3) Remote node (cdh-dn01) where we want to run a shell script:
Directory structure/Path:
~/scripts/uploadFile.sh
~/data/employee_data
*********************************
Expected Result
*********************************
File, uploadFile.sh, at ~/scripts, on cdh-dn01 should get loaded to HDFS at -
/user/akhanolk/oozieProject/results-sshAction
Oozie SMTP configuration
------------------------
Add the following to the oozie-site.xml, and restart oozie.
Replace values with the same specific to your environment.
<!-- SMTP params-->
<property>
<name>oozie.email.smtp.host</name>
<value>cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.port</name>
<value>25</value>
</property>
<property>
<name>oozie.email.from.address</name>
<value>oozie@cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.auth</name>
<value>false</value>
</property>
<property>
<name>oozie.email.smtp.username</name>
<value></value>
</property>
<property>
<name>oozie.email.smtp.password</name>
<value></value>
</property>
************************
SSH setup
************************
Issues:
Review my section on issues encountered to see all the issues and fixes I had to make
to get the workflow application to work.
------------------------------------------------------------------------------------------------------
Oozie documentation:
To run SSH Testcases and for easier Hadoop start/stop configure SSH to localhost to be passphrase-less.
Create your SSH keys without a passphrase and add the public key to the authorized file:
$ ssh-keygen -t dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys2
Test that you can ssh without password:
$ ssh localhost
------------------------------------------------------------------------------------------------------
SSH tutorial:
Setup ssh - https://www.digitalocean.com/community/articles/how-to-set-up-ssh-keys--2
Oozie commands
---------------
Note: Replace oozie server and port, with your cluster-specific.
1) Submit job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSshAction/job.properties -submit
job: 0000012-130712212133144-oozie-oozi-W
2) Run job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000014-130712212133144-oozie-oozi-W
3) Check the status:
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000014-130712212133144-oozie-oozi-W
4) Suspend workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000014-130712212133144-oozie-oozi-W
5) Resume workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000014-130712212133144-oozie-oozi-W
6) Re-run workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSshAction/job.properties -rerun 0000014-130712212133144-oozie-oozi-W
7) Should you need to kill the job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000014-130712212133144-oozie-oozi-W
8) View server logs:
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000014-130712212133144-oozie-oozi-W
Logs are available at:
/var/log/oozie on the Oozie server.
************************
Output
************************
[akhanolk@cdh-dev01 ~]$ hadoop fs -ls oozieProject/res*
Found 1 items
-rw-r--r-- 3 akhanolk akhanolk 13821993 2013-10-30 20:59 oozieProject/results-sshAction/employees_data
********************
Output email
********************
From akhanolk@cdh-dev01.localdomain Wed Oct 30 22:59:16 2013
Return-Path: <akhanolk@cdh-dev01.localdomain>
X-Original-To: akhanolk@cdh-dev01
Delivered-To: akhanolk@cdh-dev01.localdomain
From: akhanolk@cdh-dev01.localdomain
To: akhanolk@cdh-dev01.localdomain
Subject: Output of workflow 0000003-131029234028597-oozie-oozi-W
Content-Type: text/plain; charset=us-ascii
Date: Wed, 30 Oct 2013 22:59:16 -0500 (CDT)
Status: R
Status of the file move: SUCCESS
************************************
Oozie web console screenshots
************************************
Available at:
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html
*************************
Issues encountered
*************************
Permissions denied error:
-------------------------
....
2013-10-29 16:13:25,949 WARN org.apache.oozie.command.wf.ActionStartXCommand:
USER[akhanolk] GROUP[-] TOKEN[] APP[WorkFlowForSshAction] JOB[0000002-
131029144918199-oozie-oozi-W] ACTION[0000002-131029144918199-oozie-oozi-
W@sshAction] Error starting action [sshAction]. ErrorType [NON_TRANSIENT],
ErrorCode [AUTH_FAILED], Message [AUTH_FAILED: Not able to perform operation
[ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 akhanolk@cdh-dn01
mkdir -p oozie-oozi/0000002-131029144918199-oozie-oozi-W/sshAction--ssh/ ]
| ErrorStream: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Steps taken to resolve:
-----------------------
a)
Tried running the command in square brackets, above, manually from cdh-dev01 (Oozie server),
when logged in as akhanolk. It worked! But the worklow in Oozie didnt;
b)
Tried running as Oozie-
sudo -u oozie ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 akhanolk@cdh-dn01 mkdir
-p oozie-oozi/0000001-1310081859355-oozie-oozi-W/action1--ssh/
Got the error
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
c)
Googled - and chanced upon this-
http://stackoverflow.com/questions/19272430/oozie-ssh-action
So, performed the necessary actions detailed below to allow oozie to ssh to cdh-dn01 as akhanolk
On cdh-dev01 (my Oozie server), located the oozie home directory and ran ssh keygen
Appended the public key to authorized_keys file home/akhanolk/.ssh/authorized_keys on cdh-dev01
Appended the same public key to authorized_keys file in cdh-dn01 (remote node) at
home/akhanolk/.ssh/authorized_keys
Issue resolved!!
@tommy1505

This comment has been minimized.

Copy link

tommy1505 commented Feb 18, 2020

That is very helpful. Thank for your guide :) :) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.