This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components (streaming map reduce action) to execute | |
python mapper and reducer scripts to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, and process. | |
Pictorial overview of workflow: | |
-------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html | |
Includes: | |
--------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data | |
and commands; Oozie actions covered: hdfs action, email action, java main action, | |
hive action; Oozie controls covered: decision, fork-join; The workflow includes a | |
sub-workflow that runs two hive actions concurrently. The hive table is partitioned; | |
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input | |
directory path and includes part of it in the key. | |
Usecase: Parse Syslog generated log files to generate reports; | |
Pictorial overview of job: |