Skip to content

Instantly share code, notes, and snippets.

Avatar

Anagha Khanolkar airawat

  • Microsoft
View GitHub Profile
@airawat
airawat / 00-LogParser-PythonMR-UsingRegex
Last active Dec 19, 2015
Mapper and Reducer in python for log parsing using python regex
View 00-LogParser-PythonMR-UsingRegex
This gist includes a mapper and reducer in python that can parse log files using
regex; Usecase: Count the number of occurances of processes that got logged by month.
Includes:
---------
Sample data
Review of log data structure
Sample data and scripts for download
Mapper
Reducer
@airawat
airawat / 00-LogParser-JavaMapReduce-Regex
Last active Sep 18, 2016
00-JavaMapperReducerUsingRegex
View 00-LogParser-JavaMapReduce-Regex
This gist includes a mapper, reducer and driver in java that can parse log files using
regex; The code for combiner is the same as reducer;
Usecase: Count the number of occurances of processes that got logged, inception to date.
Includes:
---------
Sample data and scripts for download:01-ScriptAndDataDownload
Sample data and structure: 02-SampleDataAndStructure
Mapper: 03-LogEventCountMapper.java
Reducer: 04-LogEventCountReducer.java
@airawat
airawat / 00-LogParser-Hive-Regex
Last active Sep 13, 2018
Log parser in Hive using regex serde
View 00-LogParser-Hive-Regex
This gist includes hive ql scripts to create an external partitioned table for Syslog
generated log files using regex serde;
Usecase: Count the number of occurances of processes that got logged, by year, month,
day and process.
Includes:
---------
Sample data and structure: 01-SampleDataAndStructure
Data download: 02-DataDownload
Data load commands: 03-DataLoadCommands
View 00-LogParser-PigLatin-UsingRegex
This gist includes a pig latin script to parse Syslog generated log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month,
day and process.
Includes:
---------
Sample data and structure: 01-SampleDataAndStructure
Data and script download: 02-DataAndScriptDownload
Data load commands: 03-HdfsLoadCommands
Pig script: 04-PigLatinScript
@airawat
airawat / 00-LogParserPigLatinNativeMapReduce
Last active Dec 19, 2015
There might be situations were you may have to reuse java map reduce programs within a pig program. This blog includes a sample pig script, with associated jars and sample data. The input is Syslog generated log files, and the output is a count of occurrences of processes logged inception to date.
View 00-LogParserPigLatinNativeMapReduce
This gist includes a pig latin script to parse Syslog generated log files through a
java mapreduce program that uses regex;
Usecase: Count the number of occurances of processes that got logged, by month,
day and process.
Related gist that covers the java code - https://gist.github.com/airawat/5915374
Pig version: version 0.10.0
@airawat
airawat / 00-OozieWorkflowWithPigAction
Last active Aug 6, 2018
Sample of an Oozie workflow with pig action - parses Syslog generated log files using regex.
View 00-OozieWorkflowWithPigAction
This gist includes oozie workflow components to run a pig latin script to parse
(Syslog generated) log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month,
day and process.
Pictorial overview of workflow:
-------------------------------
http://hadooped.blogspot.com/2013/07/apache-oozie-part-7-oozie-workflow-with_3.html
Includes:
@airawat
airawat / 00-OozieWorkflowStreamingMRAction-Python
Last active Nov 21, 2018
Sample of an Oozie workflow with streaming action - parses Syslog generated log files using python -regex
View 00-OozieWorkflowStreamingMRAction-Python
This gist includes oozie workflow components (streaming map reduce action) to execute
python mapper and reducer scripts to parse Syslog generated log files using regex;
Usecase: Count the number of occurances of processes that got logged, by month, and process.
Pictorial overview of workflow:
--------------------------------
http://hadooped.blogspot.com/2013/07/apache-oozie-part-5-oozie-workflow-with.html
Includes:
---------
@airawat
airawat / 00-OozieCoordinatorJobWithTimeAsTrigger
Last active Oct 21, 2017
Oozie coordinator job example with time as trigger
View 00-OozieCoordinatorJobWithTimeAsTrigger
This gist includes components of a oozie (time initiated) coordinator application - scripts/code, sample data
and commands; Oozie actions covered: hdfs action, email action, java main action,
hive action; Oozie controls covered: decision, fork-join; The workflow includes a
sub-workflow that runs two hive actions concurrently. The hive table is partitioned;
Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets the input
directory path and includes part of it in the key.
Usecase: Parse Syslog generated log files to generate reports;
Pictorial overview of job:
@airawat
airawat / 00-OozieCoordinatorJobWithFileAsTrigger
Last active Feb 12, 2018
Oozie coordinator job example with trigger file as trigger
View 00-OozieCoordinatorJobWithFileAsTrigger
This gist includes components of a oozie (trigger file initiated) coordinator job -
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action,
java main action, hive action; Oozie controls covered: decision, fork-join; The workflow
includes a sub-workflow that runs two hive actions concurrently. The hive table is
partitioned; Parsing uses hive-regex serde, and Java-regex. Also, the java mapper, gets
the input directory path and includes part of it in the key.
Usecase
-------
Parse Syslog generated log files to generate reports;
@airawat
airawat / 00-OozieCoordinatorJobWithDatasetCreationAsTrigger
Last active Jul 1, 2020
Sample Oozie coordinator job that executes upon availability of a specified dataset. Includes scripts/code, sample data, commands.
View 00-OozieCoordinatorJobWithDatasetCreationAsTrigger
This gist includes components of a oozie, dataset availability initiated, coordinator job -
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action,
sqoop action (mysql database); Oozie controls covered: decision;
Usecase
-------
Pipe report data available in HDFS, to mysql database;
Pictorial overview of job:
--------------------------
You can’t perform that action at this time.