Skip to content

Instantly share code, notes, and snippets.

View navula1's full-sized avatar

NarsiReddy Avua navula1

  • LegatoHealth Technologies
  • Hyderabad
View GitHub Profile
@navula1
navula1 / routehl7.xml
Created August 29, 2020 03:29 — forked from alopresto/routehl7.xml
An Apache NiFi template which generates HL7 data and routes it based on a simple comparison.
<?xml version="1.0" ?>
<template encoding-version="1.1">
<description></description>
<groupId>3b737254-015b-1000-aee2-fe3d19b02179</groupId>
<name>RouteHL7</name>
<snippet>
<connections>
<id>015b1002-e563-1455-0000-000000000000</id>
<parentGroupId>3b737254-015b-1000-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@navula1
navula1 / gist:ca7da824697e312bedcc
Last active August 29, 2015 14:25 — forked from need4spd/gist:4584416
hadoop multiple outputs map/reduce sample
//mapper
package com.tistory.devyongsik.hadoop.mapre;
import java.io.IOException;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
My blog has an introduction to reduce side join in Java map reduce-
http://hadooped.blogspot.com/2013/09/reduce-side-join-options-in-java-map.html
**********************
**Gist
**********************
This gist details how to inner join two large datasets on the map-side, leveraging the join capability
in mapreduce. Such a join makes sense if both input datasets are too large to qualify for distribution
through distributedcache, and can be implemented if both input datasets can be joined by the join key
and both input datasets are sorted in the same order, by the join key.
There are two critical pieces to engaging the join behavior:
This gist covers a simple Pig eval UDF in Java, that mimics NVL2 functionality in Oracle.
Included:
1. Input data
2. UDF code in java
3. Pig script to demo the UDF
4. Expected result
5. Command to execute script
6. Output
This gist covers a simple Hive genericUDF in Java, that mimics NVL2 functionality in Oracle.
NVL2 is used to handle nulls and conditionally substitute values.
Included:
1. Input data
2. Expected results
3. UDF code in java
4. Hive query to demo the UDF
5. Output
Secondary sort in Mapreduce
With mapreduce framework, the keys are sorted but the values associated with each key
are not. In order for the values to be sorted, we need to write code to perform what is
referred to a secondary sort. The sample code in this gist demonstrates such a sort.
The input to the program is a bunch of employee attributes.
The output required is department number (deptNo) in ascending order, and the employee last name,
first name and employee ID in descending order.