This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a mapper and reducer in python that can parse log files using | |
regex; Usecase: Count the number of occurances of processes that got logged by month. | |
Includes: | |
--------- | |
Sample data | |
Review of log data structure | |
Sample data and scripts for download | |
Mapper | |
Reducer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Includes: | |
--------- | |
Sample data and structure: 01-SampleDataAndStructure | |
Data and script download: 02-DataAndScriptDownload | |
Data load commands: 03-HdfsLoadCommands | |
Pig script: 04-PigLatinScript |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes a pig latin script to parse Syslog generated log files through a | |
java mapreduce program that uses regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Related gist that covers the java code - https://gist.github.com/airawat/5915374 | |
Pig version: version 0.10.0 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow - scripts/code, sample data | |
and commands; Oozie actions covered: java main action; Oozie controls | |
covered: start, kill, end; The java program uses regex to parse the logs, and | |
also extracts pat of the mapper input directory path and includes in the key | |
emitted. | |
Usecase | |
------- | |
Parse Syslog generated log files to generate reports; | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist demonstrates how to create a map file, from a text file. | |
Includes: | |
--------- | |
1. Input data and script download | |
2. Input data-review | |
3. Data load commands | |
4. Java program to create the map file out of a text file in HDFS | |
5. Command to run Java program | |
6. Results of the program run to create map file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist demonstrates how to do a map-side join, joining a MapFile from distributedcache | |
with a larger dataset in HDFS. | |
Includes: | |
--------- | |
1. Input data and script download | |
2. Dataset structure review | |
3. Expected results | |
4. Mapper code | |
5. Driver code |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle. | |
Included: | |
1. Input data | |
2. UDF code in java | |
3. Pig script to demo the UDF | |
4. Expected result | |
5. Command to execute script | |
6. Output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
...... | |
List<String> artifactList = new List<String> (); | |
var scanOpts = new ScanOptions(); | |
String rowRegex = rowID + ".*"; | |
IteratorSetting iterSttng = new IteratorSetting(); | |
iterSttng.Priority = 15; | |
iterSttng.Name = "rowIDRegexFilter"; | |
iterSttng.IteratorClass="org.apache.accumulo.core.iterators.user.RegExFilter"; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
About this gist: | |
================ | |
This gist is a part of a series of log parsers in Java Mapreduce, Pig, Hive, Python... | |
This one covers a log parser in Cascading. | |
It reads syslogs in HDFS - | |
a) Parses them based on a regex pattern & writes parsed files to HDFS | |
b) Writes records that dont match pattern to HDFS | |
c) Writes a report to HDFS that contains the count of distinct processes logged. | |
Other gists/blogs: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The sample programs, for Cascading(2.5.1) for Accumulo(1.5.0) are in github - | |
https://github.com/airawat/cascading.accumulo.examples | |
The source code for the extensions are at- | |
https://github.com/airawat/cascading.accumulo |
OlderNewer