Skip to content

Instantly share code, notes, and snippets.

View anoopushet's full-sized avatar

Anoop anoopushet

  • Bangalore
View GitHub Profile
@anoopushet
anoopushet / 00-OozieWorkflowShellAction
Created February 24, 2016 06:04 — forked from airawat/00-OozieWorkflowShellAction
Oozie workflow with a shell action - with CaptureOutput Counts lines in a glob provided and writes the same to standard output. A subsequent email action emails the output of the shell action
This gist includes components of a oozie workflow - scripts/code, sample data
and commands; Oozie actions covered: shell action, email action
Action 1: The shell action executes a shell script that does a line count for files in a
glob provided, and writes the line count to standard output
Action 2: The email action emails the output of action 1
Pictorial overview of job:
--------------------------
@anoopushet
anoopushet / elephant bird demo.pig
Created January 31, 2016 08:19 — forked from neilkod/elephant bird demo.pig
elephant bird demo.pig
sample pig script, runs fine in local mode. the elephantbird magic is the JsonLoader() in the LOAD command and then
converting user to a java map so that i can extract screen_name. I haven't read the docs yet but there may be a better way to do this. I'm sure I can combine the two generate statements into one, this is just a first attempt.
REGISTER '/Users/nkodner/Downloads/cdh3/elephant-bird/build/elephant-bird-2.2.4-SNAPSHOT.jar';
REGISTER '/Users/nkodner/Downloads/cdh3/pig-0.8.1-cdh3u4/contrib/piggybank/java/lib/json-simple-1.1.jar';
REGISTER '/Users/nkodner/Downloads/cdh3/pig-0.8.1-cdh3u4/build/ivy/lib/Pig/guava-r06.jar';
raw = LOAD '/Users/nkodner/clean_tweets/with_deletedaa' using com.twitter.elephantbird.pig.load.JsonLoader();
bah = limit raw 100;
cc = foreach bah generate (chararray)$0#'text' as text,(long)$0#'id' as id,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'user') as user;
dd = foreach cc generat