Skip to content

Instantly share code, notes, and snippets.

@airawat
airawat / 00-OozieWorkflowStreamingMRAction-Python
Last active November 21, 2018 06:24
Sample of an Oozie workflow with streaming action - parses Syslog generated log files using python -regex
This gist includes oozie workflow components (streaming map reduce action) to execute
python mapper and reducer scripts to parse Syslog generated log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month, and process.
Pictorial overview of workflow:
--------------------------------
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html
Includes:
---------
@airawat
airawat / 00-OozieCoordinatorJobWithTimeAsTrigger
Last active October 21, 2017 15:40
Oozie coordinator job example with time as trigger
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data
and commands; Oozie actions covered: hdfs action, email action, java main action,
hive action; Oozie controls covered: decision, fork-join; The workflow includes a
sub-workflow that runs two hive actions concurrently. The hive table is partitioned;
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input
directory path and includes part of it in the key.
Usecase: Parse Syslog generated log files to generate reports;
Pictorial overview of job: