This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes hive ql scripts to create an external partitioned table for Syslog | |
generated log files using regex serde; | |
Usecase: Count the number of occurances of processes that got logged, by year, month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data download: 02-DataDownload | |
Data load commands: 03-DataLoadCommands |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Pig script: 04-PigLatinScript |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files through a | |
java mapreduce program that uses regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Related gist that covers the java code - https://gist.github.com/airawat/5915374 | |
Pig version: version 0.10.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components to run a pig latin script to parse | |
(Syslog generated) log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Pictorial overview of workflow: | |
------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-7-oozie-workflow-with_3.html | |
Includes: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.jteso.hadoop.contrib.inputformat; | |
import java.io.IOException; | |
import org.apache.hadoop.fs.Path; | |
import org.apache.hadoop.io.BytesWritable; | |
import org.apache.hadoop.io.Text; | |
import org.apache.hadoop.mapreduce.InputSplit; | |
import org.apache.hadoop.mapreduce.RecordReader; | |
import org.apache.hadoop.mapreduce.TaskAttemptContext; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist covers a simple Hive genericUDF in Java, that mimics NVL2 functionality in Oracle. | |
NVL2 is used to handle nulls and conditionally substitute values. | |
Included: | |
1. Input data | |
2. Expected results | |
3. UDF code in java | |
4. Hive query to demo the UDF | |
5. Output | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
About this gist: | |
================ | |
This gist is a part of a series of log parsers in Java Mapreduce, Pig, Hive, Python... | |
This one covers a log parser in Cascading. | |
It reads syslogs in HDFS - | |
a) Parses them based on a regex pattern & writes parsed files to HDFS | |
b) Writes records that dont match pattern to HDFS | |
c) Writes a report to HDFS that contains the count of distinct processes logged. | |
Other gists/blogs: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components (streaming map reduce action) to execute | |
python mapper and reducer scripts to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, and process. | |
Pictorial overview of workflow: | |
-------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html | |
Includes: | |
--------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data | |
and commands; Oozie actions covered: hdfs action, email action, java main action, | |
hive action; Oozie controls covered: decision, fork-join; The workflow includes a | |
sub-workflow that runs two hive actions concurrently. The hive table is partitioned; | |
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input | |
directory path and includes part of it in the key. | |
Usecase: Parse Syslog generated log files to generate reports; | |
Pictorial overview of job: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie (trigger file initiated) coordinator job - | |
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
java main action, hive action; Oozie controls covered: decision, fork-join; The workflow | |
includes a sub-workflow that runs two hive actions concurrently. The hive table is | |
partitioned; Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets | |
the input directory path and includes part of it in the key. | |
Usecase | |
------- | |
Parse Syslog generated log files to generate reports; |