Skip to content

Instantly share code, notes, and snippets.

Anagha Khanolkar airawat

  • Microsoft
Block or report user

Report or block airawat

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@airawat
airawat / 00-CustomGenericUDFHive-NVL2
Last active Aug 7, 2017
Custom genericUDF in Hive Demonstrates NVL2 functionality
View 00-CustomGenericUDFHive-NVL2
This gist covers a simple Hive genericUDF in Java, that mimics NVL2 functionality in Oracle.
NVL2 is used to handle nulls and conditionally substitute values.
Included:
1. Input data
2. Expected results
3. UDF code in java
4. Hive query to demo the UDF
5. Output
@airawat
airawat / 00-CusomHiveEvalUDF-NVL2
Last active Aug 7, 2017
Cusom Hive Eval UDF NVL2
View 00-CusomHiveEvalUDF-NVL2
This gist covers a simple Hive eval UDF in Java, that mimics NVL2 functionality in Oracle.
NVL2 is used to handle nulls and conditionally substitute values.
Included:
1. Input data
2. Expected results
3. UDF code in java
4. Hive query to demo the UDF
5. Output
View 00-CustomPigEvalUDF-NVL2
This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle.
Included:
1. Input data
2. UDF code in java
3. Pig script to demo the UDF
4. Expected result
5. Command to execute script
6. Output
@airawat
airawat / 00-OozieSSHAction
Last active Jan 26, 2018
Oozie SSH action Sample Oozie workflow that demonstrates the SSH action to move files from a specific node to HDFS
View 00-OozieSSHAction
This gist covers the Oozie SSH action.
It includes components of a sample Oozie workflow application- scripts/code,
sample data and commands; Oozie actions covered: secure shell action, email
action.
My blog has documentation, and highlights of a very basic sample program.
http://hadooped.blogspot.com/2013/10/apache-oozie-part-13-oozie-ssh-action_30.html
This gist includes:
@airawat
airawat / 00-ReduceSideJoin
Last active Dec 21, 2017
ReduceSideJoin - Sample Java mapreduce program for joining datasets with cardinality of 1..1, and 1..many on the join key
View 00-ReduceSideJoin
My blog has an introduction to reduce side join in Java map reduce-
http://hadooped.blogspot.com/2013/09/reduce-side-join-options-in-java-map.html
@airawat
airawat / 00-MapSideJoinLargeDatasets
Last active Dec 23, 2017
MapsideJoinOfTwoLargeDatasets(Old API) - Joining (inner join) two large datasets on the map side
View 00-MapSideJoinLargeDatasets
**********************
**Gist
**********************
This gist details how to inner join two large datasets on the map-side, leveraging the join capability
in mapreduce. Such a join makes sense if both input datasets are too large to qualify for distribution
through distributedcache, and can be implemented if both input datasets can be joined by the join key
and both input datasets are sorted in the same order, by the join key.
There are two critical pieces to engaging the join behavior:
@airawat
airawat / 00-CombineFileInputFornat
Last active May 19, 2018
CombineFileInputFormat - a solution to efficient map reduce processing of small files
View 00-CombineFileInputFornat
*************************
Gist
*************************
One more gist related to controlling the number of mappers in a mapreduce task.
Background on Inputsplits
--------------------------
An inputsplit is a chunk of the input data allocated to a map task for processing. FileInputFormat
generates inputsplits (and divides the same into records) - one inputsplit for each file, unless the
@airawat
airawat / 00-NLineInputFormat
Last active Aug 22, 2018
NLineInputFormat - About NLineInputFormat, uses, and a sample program
View 00-NLineInputFormat
**********************
Gist
**********************
A common interview question for a Hadoop developer position is whether we can control the number of
mappers for a job. We can - there are a few ways of controlling the number of mappers, as needed.
Using NLineInputFormat is one way.
About NLineInputFormat
----------------------
@airawat
airawat / 00-MultipleOutputs
Last active Jul 17, 2019
MultipleOutputs sample program - A program that demonstrates how to generate an output file for each key
View 00-MultipleOutputs
********************************
Gist
********************************
Motivation
-----------
The typical mapreduce job creates files with the prefix "part-"..and then the "m" or "r" depending
on whether it is a map or a reduce output, and then the part number. There are scenarios where we
may want to create separate files based on criteria-data keys and/or values. Enter the "MultipleOutputs"
functionality.
@airawat
airawat / 00-SecondarySortJavaMapReduce
Last active Dec 8, 2018
Secondary sort in mapreduce - Includes code for a simple program that sorts employee information by department ascending and employee name desc.
View 00-SecondarySortJavaMapReduce
Secondary sort in Mapreduce
With mapreduce framework, the keys are sorted but the values associated with each key
are not. In order for the values to be sorted, we need to write code to perform what is
referred to a secondary sort. The sample code in this gist demonstrates such a sort.
The input to the program is a bunch of employee attributes.
The output required is department number (deptNo) in ascending order, and the employee last name,
first name and employee ID in descending order.
You can’t perform that action at this time.