Skip to content

Instantly share code, notes, and snippets.


Anagha Khanolkar airawat

  • Microsoft
View GitHub Profile
airawat / 00-OozieWorkflowStreamingMRAction-Python
Last active Nov 21, 2018
Sample of an Oozie workflow with streaming action - parses Syslog generated log files using python -regex
View 00-OozieWorkflowStreamingMRAction-Python
This gist includes oozie workflow components (streaming map reduce action) to execute
python mapper and reducer scripts to parse Syslog generated log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month, and process.
Pictorial overview of workflow:
airawat / 00-OozieCoordinatorJobWithTimeAsTrigger
Last active Oct 21, 2017
Oozie coordinator job example with time as trigger
View 00-OozieCoordinatorJobWithTimeAsTrigger
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data
and commands; Oozie actions covered: hdfs action, email action, java main action,
hive action; Oozie controls covered: decision, fork-join; The workflow includes a
sub-workflow that runs two hive actions concurrently. The hive table is partitioned;
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input
directory path and includes part of it in the key.
Usecase: Parse Syslog generated log files to generate reports;
Pictorial overview of job:
You can’t perform that action at this time.