Skip to content

Instantly share code, notes, and snippets.

Anagha Khanolkar airawat

  • Microsoft
Block or report user

Report or block airawat

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@airawat
airawat / 00-LogParser-PythonMR-UsingRegex
Last active Dec 19, 2015
Mapper and Reducer in python for log parsing using python regex
View 00-LogParser-PythonMR-UsingRegex
This gist includes a mapper and reducer in python that can parse log files using
regex; Usecase: Count the number of occurances of processes that got logged by month.
Includes:
---------
Sample data
Review of log data structure
Sample data and scripts for download
Mapper
Reducer
@airawat
airawat / 00-LogParser-JavaMapReduce-Regex
Last active Sep 18, 2016
00-JavaMapperReducerUsingRegex
View 00-LogParser-JavaMapReduce-Regex
This gist includes a mapper, reducer and driver in java that can parse log files using
regex; The code for combiner is the same as reducer;
Usecase: Count the number of occurances of processes that got logged, inception to date.
Includes:
---------
Sample data and scripts for download:01-ScriptAndDataDownload
Sample data and structure: 02-SampleDataAndStructure
Mapper: 03-LogEventCountMapper.java
Reducer: 04-LogEventCountReducer.java
@airawat
airawat / 00-LogParser-Hive-Regex
Last active Sep 13, 2018
Log parser in Hive using regex serde
View 00-LogParser-Hive-Regex
This gist includes hive ql scripts to create an external partitioned table for Syslog
generated log files using regex serde;
Usecase: Count the number of occurances of processes that got logged, by year, month,
day and process.
Includes:
---------
Sample data and structure: 01-SampleDataAndStructure
Data download: 02-DataDownload
Data load commands: 03-DataLoadCommands
View 00-LogParser-PigLatin-UsingRegex
This gist includes a pig latin script to parse Syslog generated log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month,
day and process.
Includes:
---------
Sample data and structure: 01-SampleDataAndStructure
Data and script download: 02-DataAndScriptDownload
Data load commands: 03-HdfsLoadCommands
Pig script: 04-PigLatinScript
@airawat
airawat / 00-LogParserPigLatinNativeMapReduce
Last active Dec 19, 2015
There might be situations were you may have to reuse java map reduce programs within a pig program. This blog includes a sample pig script, with associated jars and sample data. The input is Syslog generated log files, and the output is a count of occurrences of processes logged inception to date.
View 00-LogParserPigLatinNativeMapReduce
This gist includes a pig latin script to parse Syslog generated log files through a
java mapreduce program that uses regex;
Usecase: Count the number of occurances of processes that got logged, by month,
day and process.
Related gist that covers the java code - https://gist.github.com/airawat/5915374
Pig version: version 0.10.0
@airawat
airawat / 00-OozieWorkflowWithPigAction
Last active Aug 6, 2018
Sample of an Oozie workflow with pig action - parses Syslog generated log files using regex.
View 00-OozieWorkflowWithPigAction
This gist includes oozie workflow components to run a pig latin script to parse
(Syslog generated) log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month,
day and process.
Pictorial overview of workflow:
-------------------------------
http://hadooped.blogspot.com/2013/07/apache-oozie-part-7-oozie-workflow-with_3.html
Includes:
@airawat
airawat / 00-OozieWorkflowStreamingMRAction-Python
Last active Nov 21, 2018
Sample of an Oozie workflow with streaming action - parses Syslog generated log files using python -regex
View 00-OozieWorkflowStreamingMRAction-Python
This gist includes oozie workflow components (streaming map reduce action) to execute
python mapper and reducer scripts to parse Syslog generated log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month, and process.
Pictorial overview of workflow:
--------------------------------
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html
Includes:
---------
@airawat
airawat / 00-OozieCoordinatorJobWithTimeAsTrigger
Last active Oct 21, 2017
Oozie coordinator job example with time as trigger
View 00-OozieCoordinatorJobWithTimeAsTrigger
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data
and commands; Oozie actions covered: hdfs action, email action, java main action,
hive action; Oozie controls covered: decision, fork-join; The workflow includes a
sub-workflow that runs two hive actions concurrently. The hive table is partitioned;
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input
directory path and includes part of it in the key.
Usecase: Parse Syslog generated log files to generate reports;
Pictorial overview of job:
@airawat
airawat / 00-OozieCoordinatorJobWithFileAsTrigger
Last active Feb 12, 2018
Oozie coordinator job example with trigger file as trigger
View 00-OozieCoordinatorJobWithFileAsTrigger
This gist includes components of a oozie (trigger file initiated) coordinator job -
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action,
java main action, hive action; Oozie controls covered: decision, fork-join; The workflow
includes a sub-workflow that runs two hive actions concurrently. The hive table is
partitioned; Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets
the input directory path and includes part of it in the key.
Usecase
-------
Parse Syslog generated log files to generate reports;
@airawat
airawat / 00-OozieCoordinatorJobWithDatasetCreationAsTrigger
Last active Oct 3, 2017
Sample Oozie coordinator job that executes upon availability of a specified dataset. Includes scripts/code, sample data, commands.
View 00-OozieCoordinatorJobWithDatasetCreationAsTrigger
This gist includes components of a oozie, dataset availability initiated, coordinator job -
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action,
sqoop action (mysql database); Oozie controls covered: decision;
Usecase
-------
Pipe report data available in HDFS, to mysql database;
Pictorial overview of job:
--------------------------
You can’t perform that action at this time.