Skip to content

Instantly share code, notes, and snippets.

@airawat
airawat / 00-OozieWorkflowHdfsAndEmailActions
Last active November 21, 2018 14:33
Oozie workflow application with FS and email actions; Includes sample data, workflow components, commands.
This gist includes components of a simple workflow application that created a directory and moves files within
hdfs to this directory;
Emails are sent out to notify designated users of success/failure of workflow. There is a prepare section,
to allow re-run of the action..the prepare essentially negates the move done by a potential prior run
of the action. Sample data is also included.
The sample application includes:
--------------------------------
1. Oozie actions: hdfs action and email action
2. Oozie workflow controls: start, end, and kill.
@airawat
airawat / 00-OozieWorkflowSqoopAction
Last active January 20, 2024 07:08
Oozie workflow application with sqoop action Pipes data from Hive table to mysql database table Oozie 3.3.0; Sqoop (1.4.2) with Mysql (5.1.69 )
This gist includes components of a simple workflow application (oozie 3.3.0) that
pipes data in a Hive table to mysql;
The sample application includes:
--------------------------------
1. Oozie actions: sqoop action
2. Oozie workflow controls: start, end, and kill.
3. Workflow components: job.properties and workflow.xml
4. Sample data
5. Prep tasks in Hive
@airawat
airawat / 00-OozieWorkflowJavaMapReduceAction
Last active February 23, 2023 20:19
Oozie workflow application with a Java Mapreduce action that parses syslog generated log files and generates a report Gist includes sample data, all workflow components, java mapreduce program code, commands - hdfs and Oozie
This gist includes components of a oozie workflow - scripts/code, sample data
and commands; Oozie actions covered: java mapreduce action; Oozie controls
covered: start, kill, end; The java program uses regex to parse the logs, and
also extracts the path of the mapper input directory path and includes in the
key emitted.
Note: The reducer can be specified as a combiner as well.
Usecase
-------
@airawat
airawat / 00-OozieWorkflowJavaMainAction
Last active December 19, 2015 18:59
Oozie workflow application with a java main action The java program parses log files and generates a report. Sample data, code, workflow components, commands are provided.
This gist includes components of a oozie workflow - scripts/code, sample data
and commands; Oozie actions covered: java main action; Oozie controls
covered: start, kill, end; The java program uses regex to parse the logs, and
also extracts pat of the mapper input directory path and includes in the key
emitted.
Usecase
-------
Parse Syslog generated log files to generate reports;
@airawat
airawat / 00-OozieBundleApplication
Last active June 14, 2021 13:57
Oozie bundle application sample. The sample bundle application is time triggered. The start time is defined in the bundle job.properties file. The bundle application starts two coordinator applications- as defined in the bundle definition file - bundleConfirguration.xml. The first coordinator job is time triggered. The start time is defined in t…
Introduction
-------------
This gist includes sample data, application components, and components to execute a bundle application.
The sample bundle application is time triggered. The start time is defined in the bundle job.properties
file. The bundle application starts two coordinator applications- as defined in the bundle definition file -
bundleConfirguration.xml.
The first coordinator job is time triggered. The start time is defined in the bundle job.properties file.
It runs a workflow, that includes a java main action. The java program parses some log files and generates
@airawat
airawat / 00-OozieWorkflowWithSubworkflow
Last active January 3, 2019 18:08
Oozie workflow application with a subworkflow Includes - sample data, workflow components, hdfs and oozie commands, application output
This gist includes components of a oozie workflow application - scripts/code, sample data
and commands; Oozie actions covered: sub-workflow, email java main action,
sqoop action (to mysql); Oozie controls covered: decision;
Pictorial overview:
--------------------
http://hadooped.blogspot.com/2013/07/apache-oozie-part-8-subworkflow.html
Usecase:
--------
@airawat
airawat / 00-OozieWorkflowCallWithJavaAPI
Last active September 4, 2016 17:59
Oozie workflow - invoked from Java using Oozie Java API
import java.util.Properties;
import org.apache.oozie.client.OozieClient;
import org.apache.oozie.client.WorkflowJob;
public class myOozieWorkflowJavaAPICall {
public static void main(String[] args) {
OozieClient wc = new OozieClient("http://cdh-dev01:11000/oozie");
@airawat
airawat / 00-OozieWorkflowShellAction
Last active March 18, 2021 08:34
Oozie workflow with a shell action - with CaptureOutput Counts lines in a glob provided and writes the same to standard output. A subsequent email action emails the output of the shell action
This gist includes components of a oozie workflow - scripts/code, sample data
and commands; Oozie actions covered: shell action, email action
Action 1: The shell action executes a shell script that does a line count for files in a
glob provided, and writes the line count to standard output
Action 2: The email action emails the output of action 1
Pictorial overview of job:
--------------------------
@airawat
airawat / 00-CreatingMapFile
Last active December 22, 2015 22:18
Creating a Map file in Hadoop. This gist covers reading a text file in HDFS, and creating a map file
This gist demonstrates how to create a map file, from a text file.
Includes:
---------
1. Input data and script download
2. Input data-review
3. Data load commands
4. Java program to create the map file out of a text file in HDFS
5. Command to run Java program
6. Results of the program run to create map file
@airawat
airawat / 00-CreatingSequenceFile
Last active March 19, 2019 18:35
Hadoop Sequence File - Sample program to create a sequence file (compressed and uncompressed) from a text file, and another to read the sequence file.
This gist demonstrates how to create a sequence file (compressed and uncompressed), from a text file.
Includes:
---------
1. Input data and script download
2. Input data-review
3. Data load commands
4. Mapper code
5. Driver code to create the sequence file out of a text file in HDFS
6. Command to run Java program