Skip to content

Instantly share code, notes, and snippets.

Anagha Khanolkar airawat

  • Microsoft
Block or report user

Report or block airawat

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@airawat
airawat / 00-OozieWorkflowStreamingMRAction-Python
Last active Nov 21, 2018
Sample of an Oozie workflow with streaming action - parses Syslog generated log files using python -regex
View 00-OozieWorkflowStreamingMRAction-Python
This gist includes oozie workflow components (streaming map reduce action) to execute
python mapper and reducer scripts to parse Syslog generated log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month, and process.
Pictorial overview of workflow:
--------------------------------
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html
Includes:
---------
@airawat
airawat / 00-OozieCoordinatorJobWithTimeAsTrigger
Last active Oct 21, 2017
Oozie coordinator job example with time as trigger
View 00-OozieCoordinatorJobWithTimeAsTrigger
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data
and commands; Oozie actions covered: hdfs action, email action, java main action,
hive action; Oozie controls covered: decision, fork-join; The workflow includes a
sub-workflow that runs two hive actions concurrently. The hive table is partitioned;
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input
directory path and includes part of it in the key.
Usecase: Parse Syslog generated log files to generate reports;
Pictorial overview of job:
You can’t perform that action at this time.