Skip to content

Instantly share code, notes, and snippets.

@neilkod
Created June 8, 2012 22:30
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save neilkod/2898455 to your computer and use it in GitHub Desktop.
Save neilkod/2898455 to your computer and use it in GitHub Desktop.
elephant bird demo.pig
sample pig script, runs fine in local mode. the elephantbird magic is the JsonLoader() in the LOAD command and then
converting user to a java map so that i can extract screen_name. I haven't read the docs yet but there may be a better way to do this. I'm sure I can combine the two generate statements into one, this is just a first attempt.
REGISTER '/Users/nkodner/Downloads/cdh3/elephant-bird/build/elephant-bird-2.2.4-SNAPSHOT.jar';
REGISTER '/Users/nkodner/Downloads/cdh3/pig-0.8.1-cdh3u4/contrib/piggybank/java/lib/json-simple-1.1.jar';
REGISTER '/Users/nkodner/Downloads/cdh3/pig-0.8.1-cdh3u4/build/ivy/lib/Pig/guava-r06.jar';
raw = LOAD '/Users/nkodner/clean_tweets/with_deletedaa' using com.twitter.elephantbird.pig.load.JsonLoader();
bah = limit raw 100;
cc = foreach bah generate (chararray)$0#'text' as text,(long)$0#'id' as id,com.twitter.elephantbird.pig.piggybank.JsonStringToMap($0#'user') as user;
dd = foreach cc generate text,id,user#'screen_name' as name:chararray;
sample input records (note that user.screen_name is nested json):
=================================================================
{"in_reply_to_user_id_str":null,"coordinates":null,"text":"\u0627\u0627\u0627\u0627\u0627\u062d \u0627\u0644\u0627\u062c\u0648\u0627\u0621 \u0628\u062a\u0627\u0639\u062a \u0633\u0643\u0633 \u062d\u0627\u0627\u0627\u0627\u0627\u0631\u0631 \u0645\u0646\u0648 \u0627\u0644\u0641\u062d\u0644 \u0627\u0644\u0644\u064a \u064a\u0628\u064a \u0627\u0633\u0648\u064a \u0644\u0647 \u0641\u0648\u0644\u0648 \u064a\u0633\u0648\u064a \u0631\u062a\u0648\u064a\u062a","created_at":"Thu Apr 12 17:38:47 +0000 2012","favorited":false,"contributors":null,"in_reply_to_screen_name":null,"source":"\u003Ca href=\"http:\/\/blackberry.com\/twitter\" rel=\"nofollow\"\u003ETwitter for BlackBerry\u00ae\u003C\/a\u003E","retweet_count":0,"in_reply_to_user_id":null,"in_reply_to_status_id":null,"id_str":"190494185374220289","entities":{"hashtags":[],"user_mentions":[],"urls":[]},"geo":null,"retweeted":false,"place":null,"truncated":false,"in_reply_to_status_id_str":null,"user":{"created_at":"Tue Apr 10 11:43:10 +0000 2012","notifications":null,"profile_use_background_image":true,"profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png","url":null,"contributors_enabled":false,"geo_enabled":false,"profile_text_color":"333333","followers_count":5,"profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/2084863527\/Screen-120409-224940_normal.jpg","profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/2084863527\/Screen-120409-224940_normal.jpg","listed_count":0,"profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png","description":"\u0627\u0628\u064a \u0633\u0643\u0633 \u0631\u0631\u0648\u0639\u0647 \u0645\u0639 \u0641\u062d\u0644 ","screen_name":"Ga7bah_sex","profile_link_color":"0084B4","location":"\u0627\u0631\u0636 \u0627\u0644\u0633\u0643\u0633","default_profile":true,"show_all_inline_media":false,"is_translator":false,"statuses_count":5,"profile_background_color":"C0DEED","id_str":"550121247","follow_request_sent":null,"lang":"ar","profile_background_tile":false,"protected":false,"profile_sidebar_fill_color":"DDEEF6","name":"\u0642\u062d\u0628\u0647 \u0648\u0627\u0628\u064a \u0632\u0628 ","default_profile_image":false,"time_zone":null,"friends_count":8,"id":550121247,"following":null,"verified":false,"utc_offset":null,"favourites_count":0,"profile_sidebar_border_color":"C0DEED"},"id":190494185374220289}
{"in_reply_to_user_id_str":null,"coordinates":null,"text":"\u5473\u3082\u7d20\u3063\u6c17\u3082\u306a\u3044\u4eba\u9593\u3068\u306f\u2026\u3002","created_at":"Thu Apr 12 17:38:47 +0000 2012","favorited":false,"contributors":null,"in_reply_to_screen_name":null,"source":"\u003Ca href=\"http:\/\/tapbots.com\/tweetbot\" rel=\"nofollow\"\u003ETweetbot for iOS\u003C\/a\u003E","retweet_count":0,"in_reply_to_user_id":null,"in_reply_to_status_id":null,"id_str":"190494185382608899","entities":{"hashtags":[],"user_mentions":[],"urls":[]},"geo":null,"retweeted":false,"place":null,"truncated":false,"in_reply_to_status_id_str":null,"user":{"created_at":"Sun Aug 15 22:48:55 +0000 2010","notifications":null,"profile_use_background_image":true,"profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme1\/bg.png","url":null,"contributors_enabled":false,"geo_enabled":false,"profile_text_color":"333333","followers_count":106,"profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/2028222123\/up_normal.JPG","profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/2028222123\/up_normal.JPG","listed_count":13,"profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme1\/bg.png","description":"\u672a\u3060\u5b66\u751f\u306a\u30a2\u30ec\u3002\u3080\u3057\u3083\u3080\u3057\u3083\u3002","screen_name":"hiziri_6x","profile_link_color":"0084B4","location":"","default_profile":true,"show_all_inline_media":true,"is_translator":false,"statuses_count":4258,"profile_background_color":"C0DEED","id_str":"178872797","follow_request_sent":null,"lang":"ja","profile_background_tile":false,"protected":false,"profile_sidebar_fill_color":"DDEEF6","name":"\u3072\u3058\u308a","default_profile_image":false,"time_zone":"Hawaii","friends_count":101,"id":178872797,"following":null,"verified":false,"utc_offset":-36000,"favourites_count":4857,"profile_sidebar_border_color":"C0DEED"},"id":190494185382608899}
output data looks like(not matching the records above)
======================================================
(I need a drink ..,190494768722231296,Kaci_Nicolee)
(Naar buiten #gone,190495057789468672,ErwinHuizing)
(Visto en #Tuenti.,190495196293771264,kurnym)
(i want bacon soap,190495057948848129,Sane_Barely)
(14h41 que bom n?.,190495057831407616,rhairafernanda_)
(@Josephineeex3 LOL,190495347557154816,MariaSabella04)
(@SOU01_DOCINHO ook,190494621950947328,izaadoorac)
(@Tasha_Piglet14 :O,190495347372589056,lauraljacobson)
(MSN @duannemartins,190494911164981250,InformeRJO)
(Show 'Em No Love .,190495057772691458,tYGabrielle)
(@MaddyMak91 oh bene,190494621611208704,dav87)
(@machiruc ???,190494768491536384,_tabasco)
(BB gue kembaliii :D,190495057936269313,ciia_fLa)
(@Magnus_Reloaded hey,190495196184711170,FFDHSeph)
(@crnzrn izle izle :),190494768562843648,GoncaKestane)
(@iBANG_ThaGSPOT #NFB,190494185470689280,_RareBeauty69)
(He ride my bus too .,190494332288118784,TickTickBOOM_)
(Kermis in beilen....,190494768764166145,Nickdatbenikxd)
(boooooooooa tarde c:,190494185458114560,vi_ciada)
(@_luciiiana compra ;),190495196457345026,NetinFarias)
(@ladyeanneboleyn :) x,190494768634146818,Teyalistic)
(@toyookakun ???,190494768462168066,asagayasonata)
(Iki rcti film opo ya?,190494328202862592,candracenn)
(Mac Miller is a beast,190494621980311552,KaltzSk8rdan)
@sanjay-sabnis
Copy link

I am using the JsonStringToMap to process the Json String, once I do that I get one of the elements int the map as

interestes#["surfing","Tennis","Voleyball","Soccer"], how do I process this string further when using PIG scripting to get each elements in it.

Thanks

@dheerajndh
Copy link

Where can i download elephant-bird-2.2.4-SNAPSHOT.jar all the other jars?

@tuphamphuong
Copy link

Thank you so much !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment