Skip to content

Instantly share code, notes, and snippets.

@airawat
Last active July 25, 2023 04:43
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save airawat/7243790 to your computer and use it in GitHub Desktop.
Save airawat/7243790 to your computer and use it in GitHub Desktop.
Oozie SSH action Sample Oozie workflow that demonstrates the SSH action to move files from a specific node to HDFS
This gist covers the Oozie SSH action.
It includes components of a sample Oozie workflow application- scripts/code,
sample data and commands; Oozie actions covered: secure shell action, email
action.
My blog has documentation, and highlights of a very basic sample program.
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html
This gist includes:
--------------------
Data and script download
Data load commands
Shell script
Oozie job properties file
Oozie workflow file
Oozie SMTP configuration
Oozie commands
Output in HDFS
Output email
Oozie web console - screenshots
My other blogs on Oozie:
------------------------
Blog 1: Oozie workflow - hdfs and email actions
Blog 2: Oozie workflow - hdfs, email and hive actions
Blog 3: Oozie workflow - sqoop action (Hive-mysql; sqoop export)
Blog 4: Oozie workflow - java map-reduce (new API) action
Blog 5: Oozie workflow - streaming map-reduce (python) action
Blog 6: Oozie workflow - java main action
Blog 7: Oozie workflow - Pig action
Blog 8: Oozie sub-workflow
Blog 9a: Oozie coordinator job - time-triggered sub-workflow, fork-join control and decision control
Blog 9b: Oozie coordinator jobs - file triggered
Blog 9c: Oozie coordinator jobs - dataset availability triggered
Blog 10: Oozie bundle jobs
Blog 11: Oozie Java API for interfacing with oozie workflows
Blog 12: Oozie workflow - shell action +passing output from one action to another
************************************
*Data and code/application download
************************************
Data and code:
--------------
Github:
https://github.com/airawat/OozieSamples
Email me at airawat.blog@gmail.com if you encounter any issues
Directory structure of application download
--------------------------------------------
oozieProject
workflowSshAction
job.properties
workflow.xml
scripts
uploadFile.sh
data
employees_data
************************************
*Intent of sample application
************************************
The application includes an Oozie worklow that runs an operation on a specific node
leveraging Oozie ssh action.
Specifically:
It SSHs to a node specified in the workflow (cdh-dn01), and executes the command
specified in the workflow (uploadFile.sh). The command specified is a shell script
local to the remote node - and it essentially copies a local file to HDFS.
The ssh action accepts arguments. For simplicity, the same have not been included
in the sample program.
A pictorial representation of the workflow is at:
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html
#*************************************************
# job.properties
#*************************************************
nameNode=hdfs://cdh-nn01.chuntikhadoop.com:8020
jobTracker=cdh-jt01:8021
queueName=default
oozie.libpath=${nameNode}/user/oozie/share/lib
oozie.use.system.libpath=true
oozie.wf.rerun.failnodes=true
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
appPath=${oozieProjectRoot}/workflowSshAction
oozie.wf.application.path=${appPath}
inputDir=${oozieProjectRoot}/data
focusNodeLogin=akhanolk@cdh-dn01
shellScriptPath=~/scripts/uploadFile.sh
emailToAddress=akhanolk@cdh-dev01
<!--******************************************-->
<!--workflow.xml -->
<!--******************************************-->
<workflow-app name="WorkFlowForSshAction" xmlns="uri:oozie:workflow:0.1">
<start to="sshAction"/>
<action name="sshAction">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${focusNodeLogin}</host>
<command>${shellScriptPath}</command>
<capture-output/>
</ssh>
<ok to="sendEmail"/>
<error to="killAction"/>
</action>
<action name="sendEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Output of workflow ${wf:id()}</subject>
<body>Status of the file move: ${wf:actionData('sshAction')['STATUS']}</body>
</email>
<ok to="end"/>
<error to="end"/>
</action>
<kill name="killAction">
<message>"Killed job due to error"</message>
</kill>
<end name="end"/>
</workflow-app>
#################################
# Name: uploadFile.sh
# Location: remote node where we
# want to run an
# operation
#################################
#!/bin/bash
hadoop fs -rm -R oozieProject/results-sshAction/*
hadoop fs -put ~/data/* oozieProject/results-sshAction/
status=$?
if [ $status = 0 ]; then
echo "STATUS=SUCCESS"
else
echo "STATUS=FAIL"
fi
*****************************************
Location of files/scripts & commands
*****************************************
I have pasted information specific to my environment; Modify as required.
1) Node (cdh-dev01) where the Oozie CLI will be used to submit/run Oozie workflow:
Structure/Path:
~/oozieProject/workflowSshAction/job.properties
2) HDFS:
Workflow directory structure:
/user/akhanolk/oozieProject/workflowSshAction/workflow.xml
Commands to load:
hadoop fs -mkdir oozieProject
hadoop fs -mkdir oozieProject/workflowSshAction
hadoop fs -put ~/oozieProject/workflowSshAction/workflow.xml oozieProject/workflowSshAction
Output directory structure:
/user/akhanolk/oozieProject/results-sshAction
Command:
hadoop fs -mkdir oozieProject/results-sshAction
3) Remote node (cdh-dn01) where we want to run a shell script:
Directory structure/Path:
~/scripts/uploadFile.sh
~/data/employee_data
*********************************
Expected Result
*********************************
File, uploadFile.sh, at ~/scripts, on cdh-dn01 should get loaded to HDFS at -
/user/akhanolk/oozieProject/results-sshAction
Oozie SMTP configuration
------------------------
Add the following to the oozie-site.xml, and restart oozie.
Replace values with the same specific to your environment.
<!-- SMTP params-->
<property>
<name>oozie.email.smtp.host</name>
<value>cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.port</name>
<value>25</value>
</property>
<property>
<name>oozie.email.from.address</name>
<value>oozie@cdh-dev01</value>
</property>
<property>
<name>oozie.email.smtp.auth</name>
<value>false</value>
</property>
<property>
<name>oozie.email.smtp.username</name>
<value></value>
</property>
<property>
<name>oozie.email.smtp.password</name>
<value></value>
</property>
************************
SSH setup
************************
Issues:
Review my section on issues encountered to see all the issues and fixes I had to make
to get the workflow application to work.
------------------------------------------------------------------------------------------------------
Oozie documentation:
To run SSH Testcases and for easier Hadoop start/stop configure SSH to localhost to be passphrase-less.
Create your SSH keys without a passphrase and add the public key to the authorized file:
$ ssh-keygen -t dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys2
Test that you can ssh without password:
$ ssh localhost
------------------------------------------------------------------------------------------------------
SSH tutorial:
Setup ssh - https://www.digitalocean.com/community/articles/how-to-set-up-ssh-keys--2
Oozie commands
---------------
Note: Replace oozie server and port, with your cluster-specific.
1) Submit job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSshAction/job.properties -submit
job: 0000012-130712212133144-oozie-oozi-W
2) Run job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -start 0000014-130712212133144-oozie-oozi-W
3) Check the status:
$ oozie job -oozie http://cdh-dev01:11000/oozie -info 0000014-130712212133144-oozie-oozi-W
4) Suspend workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -suspend 0000014-130712212133144-oozie-oozi-W
5) Resume workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -resume 0000014-130712212133144-oozie-oozi-W
6) Re-run workflow:
$ oozie job -oozie http://cdh-dev01:11000/oozie -config oozieProject/workflowSshAction/job.properties -rerun 0000014-130712212133144-oozie-oozi-W
7) Should you need to kill the job:
$ oozie job -oozie http://cdh-dev01:11000/oozie -kill 0000014-130712212133144-oozie-oozi-W
8) View server logs:
$ oozie job -oozie http://cdh-dev01:11000/oozie -logs 0000014-130712212133144-oozie-oozi-W
Logs are available at:
/var/log/oozie on the Oozie server.
************************
Output
************************
[akhanolk@cdh-dev01 ~]$ hadoop fs -ls oozieProject/res*
Found 1 items
-rw-r--r-- 3 akhanolk akhanolk 13821993 2013-10-30 20:59 oozieProject/results-sshAction/employees_data
********************
Output email
********************
From akhanolk@cdh-dev01.localdomain Wed Oct 30 22:59:16 2013
Return-Path: <akhanolk@cdh-dev01.localdomain>
X-Original-To: akhanolk@cdh-dev01
Delivered-To: akhanolk@cdh-dev01.localdomain
From: akhanolk@cdh-dev01.localdomain
To: akhanolk@cdh-dev01.localdomain
Subject: Output of workflow 0000003-131029234028597-oozie-oozi-W
Content-Type: text/plain; charset=us-ascii
Date: Wed, 30 Oct 2013 22:59:16 -0500 (CDT)
Status: R
Status of the file move: SUCCESS
************************************
Oozie web console screenshots
************************************
Available at:
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html
*************************
Issues encountered
*************************
Permissions denied error:
-------------------------
....
2013-10-29 16:13:25,949 WARN org.apache.oozie.command.wf.ActionStartXCommand:
USER[akhanolk] GROUP[-] TOKEN[] APP[WorkFlowForSshAction] JOB[0000002-
131029144918199-oozie-oozi-W] ACTION[0000002-131029144918199-oozie-oozi-
W@sshAction] Error starting action [sshAction]. ErrorType [NON_TRANSIENT],
ErrorCode [AUTH_FAILED], Message [AUTH_FAILED: Not able to perform operation
[ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 akhanolk@cdh-dn01
mkdir -p oozie-oozi/0000002-131029144918199-oozie-oozi-W/sshAction--ssh/ ]
| ErrorStream: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Steps taken to resolve:
-----------------------
a)
Tried running the command in square brackets, above, manually from cdh-dev01 (Oozie server),
when logged in as akhanolk. It worked! But the worklow in Oozie didnt;
b)
Tried running as Oozie-
sudo -u oozie ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o
StrictHostKeyChecking=no -o ConnectTimeout=20 akhanolk@cdh-dn01 mkdir
-p oozie-oozi/0000001-1310081859355-oozie-oozi-W/action1--ssh/
Got the error
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
c)
Googled - and chanced upon this-
http://stackoverflow.com/questions/19272430/oozie-ssh-action
So, performed the necessary actions detailed below to allow oozie to ssh to cdh-dn01 as akhanolk
On cdh-dev01 (my Oozie server), located the oozie home directory and ran ssh keygen
Appended the public key to authorized_keys file home/akhanolk/.ssh/authorized_keys on cdh-dev01
Appended the same public key to authorized_keys file in cdh-dn01 (remote node) at
home/akhanolk/.ssh/authorized_keys
Issue resolved!!
@tommy1505
Copy link

That is very helpful. Thank for your guide :) :) :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment