Skip to content

Instantly share code, notes, and snippets.

@rajkrrsingh
Last active April 24, 2019 14:39
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajkrrsingh/71f43afaac098428dc614d50ca0293ac to your computer and use it in GitHub Desktop.
Save rajkrrsingh/71f43afaac098428dc614d50ca0293ac to your computer and use it in GitHub Desktop.
oozie spark action example

directory structure at hdfs

[oozie@rk253 ~]$ hadoop fs -lsr /tmp/sparkOozieAction
lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxrwxrwx   3 oozie hdfs        167 2017-05-08 05:01 /tmp/sparkOozieAction/job.properties
drwxrwxrwx   - oozie hdfs          0 2017-05-08 05:04 /tmp/sparkOozieAction/lib
-rwxrwxrwx   3 oozie hdfs  110488188 2017-05-08 04:58 /tmp/sparkOozieAction/lib/spark-examples-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar
-rw-r--r--   3 oozie hdfs       1571 2017-05-08 05:46 /tmp/sparkOozieAction/workflow.xml

oozie share lib

[oozie@rk253 ~]$ hadoop fs -ls /user/oozie/share/lib/lib_20170508043956/spark
Found 8 items
-rw-r--r--   3 oozie hdfs     339666 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-api-jdo-3.2.6.jar
-rw-r--r--   3 oozie hdfs    1890075 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-core-3.2.10.jar
-rw-r--r--   3 oozie hdfs    1809447 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/datanucleus-rdbms-3.2.9.jar
-rw-r--r--   3 oozie hdfs        167 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/hive-site.xml
-rw-r--r--   3 oozie hdfs      22440 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/oozie-sharelib-spark-4.2.0.2.5.3.0-37.jar
-rw-r--r--   3 oozie hdfs      44846 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/py4j-0.9-src.zip
-rw-r--r--   3 oozie hdfs     357563 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/pyspark.zip
-rw-r--r--   3 oozie hdfs  188897932 2017-05-08 04:42 /user/oozie/share/lib/lib_20170508043956/spark/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar

job.properties

[oozie@rk253 ~]$ cat job.properties 
nameNode= hdfs://rk253.openstack:8020 
jobTracker= rk253.openstack:8050 
oozie.wf.application.path=/tmp/sparkOozieAction/ 
oozie.use.system.libpath=true 
master=yarn-client

workflow.xml

[oozie@rk253 ~]$ cat job.properties 
nameNode= hdfs://rk253.openstack:8020 
jobTracker= rk253.openstack:8050 
oozie.wf.application.path=/tmp/sparkOozieAction/ 
oozie.use.system.libpath=true 
master=yarn-client
[oozie@rk253 ~]$ cat workflow.xml 
<workflow-app name="spark-wf" xmlns="uri:oozie:workflow:0.5"> 
        <start to="spark-action"/> 
        <action name="spark-action"> 
                <spark xmlns="uri:oozie:spark-action:0.1"> 
                        <job-tracker>${jobTracker}</job-tracker> 
                        <name-node>${nameNode}</name-node> 
                        <configuration> 
                        </configuration> 
                        <master>${master}</master> 
                        <name>spark pi job</name> 
                        <class>org.apache.spark.examples.SparkPi</class> 
                        <jar>${nameNode}/tmp/sparkOozieAction/lib/spark-examples-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar</jar> 
                        <spark-opts>--driver-memory 512m --executor-memory 512m --num-executors 1</spark-opts> 
                        <arg>10</arg> 
                </spark> 
                <ok to="end"/> 
                <error to="kill"/> 
        </action> 
        <kill name="kill"> 
                <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 
        </kill> 
        <end name="end"/> 
</workflow-app> 

run

oozie job -oozie http://rk253:11000/oozie/ -config job.properties -run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment