This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
git clone git://git.apache.org/incubator-datafu.git datafu | |
cd datafu/contrib/hourglass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Mapper<GenericRecord,GenericRecord,GenericRecord> mapper = | |
new Mapper<GenericRecord,GenericRecord,GenericRecord>() { | |
private transient Schema kSchema; | |
private transient Schema vSchema; | |
@Override | |
public void map( | |
GenericRecord input, | |
KeyValueCollector<GenericRecord, GenericRecord> collector) | |
throws IOException, InterruptedException |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"type" : "record", "name" : "ExampleEvent", | |
"namespace" : "datafu.hourglass.test", | |
"fields" : [ { | |
"name" : "id", | |
"type" : "long", | |
"doc" : "ID" | |
} ] | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DEFINE EmptyBagToNullFields datafu.pig.bags.EmptyBagToNullFields(); | |
input1 = LOAD 'input1' using PigStorage(',') AS (val1:INT,val2:INT); | |
input2 = LOAD 'input2' using PigStorage(',') AS (val1:INT,val2:INT); | |
input3 = LOAD 'input3' using PigStorage(',') AS (val1:INT,val2:INT); | |
data1 = COGROUP input1 BY val1, input2 BY val1, input3 BY val1; | |
data2 = FOREACH data1 GENERATE | |
FLATTEN(input1), | |
FLATTEN(EmptyBagToNullFields(input2)), |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
input1 = LOAD 'input1' using PigStorage(',') AS (val1:INT,val2:INT); | |
input2 = LOAD 'input2' using PigStorage(',') AS (val1:INT,val2:INT); | |
input3 = LOAD 'input3' using PigStorage(',') AS (val1:INT,val2:INT); | |
data1 = COGROUP input1 BY val1, input2 BY val1, input3 BY val1; | |
data2 = FOREACH data1 GENERATE | |
FLATTEN(input1), -- left join on this | |
FLATTEN((IsEmpty(input2) ? TOBAG(TOTUPLE((int)null,(int)null)) : input2)) | |
as (input2::val1,input2::val2), | |
FLATTEN((IsEmpty(input3) ? TOBAG(TOTUPLE((int)null,(int)null)) : input3)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
input1 = LOAD 'input1' using PigStorage(',') AS (val1:INT,val2:INT); | |
input2 = LOAD 'input2' using PigStorage(',') AS (val1:INT,val2:INT); | |
input3 = LOAD 'input3' using PigStorage(',') AS (val1:INT,val2:INT); | |
data1 = JOIN input1 BY val1 LEFT, input2 BY val1; | |
data1 = FILTER data1 BY input1::val1 IS NOT NULL; | |
data2 = JOIN data1 BY input1::val1 LEFT, input3 BY val1; | |
data2 = FILTER data2 BY input1::val1 IS NOT NULL; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
DEFINE In datafu.pig.util.In(); | |
data = LOAD 'input' using PigStorage(',') AS (what:chararray, adj:chararray); | |
dump data; | |
-- (roses,red) | |
-- (violets,blue) | |
-- (sugar,sweet) | |
data2 = FILTER data BY In(adj, 'red','blue'); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data = LOAD 'input' using PigStorage(',') AS (what:chararray, adj:chararray); | |
dump data; | |
-- (roses,red) | |
-- (violets,blue) | |
-- (sugar,sweet) | |
data2 = FILTER data BY adj == 'red' OR adj == 'blue'; | |
dump data2; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
define COALESCE datafu.pig.util.Coalesce(); | |
data = LOAD 'input' using PigStorage(',') AS (val:INT); | |
dump data; | |
-- (1) | |
-- () | |
data2 = FOREACH data GENERATE COALESCE(val,0) as result; | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
data = LOAD 'input' using PigStorage(',') AS (val:INT); | |
dump data; | |
-- (1) | |
-- () | |
data2 = FOREACH data GENERATE (val IS NOT NULL ? val : 0) as result; | |
dump data2; | |
-- (1) |
NewerOlder