This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
************************* | |
Gist | |
************************* | |
One more gist related to controlling the number of mappers in a mapreduce task. | |
Background on Inputsplits | |
-------------------------- | |
An inputsplit is a chunk of the input data allocated to a map task for processing. FileInputFormat | |
generates inputsplits (and divides the same into records) - one inputsplit for each file, unless the |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a simple workflow application (oozie 3.3.0) that | |
pipes data in a Hive table to mysql; | |
The sample application includes: | |
-------------------------------- | |
1. Oozie actions: sqoop action | |
2. Oozie workflow controls: start, end, and kill. | |
3. Workflow components: job.properties and workflow.xml | |
4. Sample data | |
5. Prep tasks in Hive |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist covers the Oozie SSH action. | |
It includes components of a sample Oozie workflow application- scripts/code, | |
sample data and commands; Oozie actions covered: secure shell action, email | |
action. | |
My blog has documentation, and highlights of a very basic sample program. | |
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html | |
This gist includes: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow - scripts/code, sample data | |
and commands; Oozie actions covered: java mapreduce action; Oozie controls | |
covered: start, kill, end; The java program uses regex to parse the logs, and | |
also extracts the path of the mapper input directory path and includes in the | |
key emitted. | |
Note: The reducer can be specified as a combiner as well. | |
Usecase | |
------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist covers a simple Hive genericUDF in Java, that mimics NVL2 functionality in Oracle. | |
NVL2 is used to handle nulls and conditionally substitute values. | |
Included: | |
1. Input data | |
2. Expected results | |
3. UDF code in java | |
4. Hive query to demo the UDF | |
5. Output | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes oozie workflow components to run a pig latin script to parse | |
(Syslog generated) log files using regex; | |
Usecase: Count the number of occurances of processes that got logged, by month, | |
day and process. | |
Pictorial overview of workflow: | |
------------------------------- | |
http://hadooped.blogspot.com/2013/07/apache-oozie-part-7-oozie-workflow-with_3.html | |
Includes: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Introduction | |
------------- | |
This gist includes sample data, application components, and components to execute a bundle application. | |
The sample bundle application is time triggered. The start time is defined in the bundle job.properties | |
file. The bundle application starts two coordinator applications- as defined in the bundle definition file - | |
bundleConfirguration.xml. | |
The first coordinator job is time triggered. The start time is defined in the bundle job.properties file. | |
It runs a workflow, that includes a java main action. The java program parses some log files and generates |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie, dataset availability initiated, coordinator job - | |
scripts/code, sample data and commands; Oozie actions covered: hdfs action, email action, | |
sqoop action (mysql database); Oozie controls covered: decision; | |
Usecase | |
------- | |
Pipe report data available in HDFS, to mysql database; | |
Pictorial overview of job: | |
-------------------------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Secondary sort in Mapreduce | |
With mapreduce framework, the keys are sorted but the values associated with each key | |
are not. In order for the values to be sorted, we need to write code to perform what is | |
referred to a secondary sort. The sample code in this gist demonstrates such a sort. | |
The input to the program is a bunch of employee attributes. | |
The output required is department number (deptNo) in ascending order, and the employee last name, | |
first name and employee ID in descending order. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This gist includes components of a oozie workflow - scripts/code, sample data | |
and commands; Oozie actions covered: shell action, email action | |
Action 1: The shell action executes a shell script that does a line count for files in a | |
glob provided, and writes the line count to standard output | |
Action 2: The email action emails the output of action 1 | |
Pictorial overview of job: | |
-------------------------- |
NewerOlder