Skip to content

Instantly share code, notes, and snippets.

@VJ310
Last active December 14, 2015 15:09
Show Gist options
  • Save VJ310/5106118 to your computer and use it in GitHub Desktop.
Save VJ310/5106118 to your computer and use it in GitHub Desktop.
Load json data to Hbase using Pig + Elephant-Bird
register '/usr/local/pig-0.11.0/lib/json-simple-1.1.jar';
register '/usr/local/pig-0.11.0/lib/elephant-bird-pig-3.0.7.jar';
register '/usr/local/hbase-0.94.5/lib/zookeeper-3.4.5.jar';
register '/usr/local/hbase-0.94.5/lib/protobuf-java-2.4.0a.jar';
--Test json data
--{"business_id": "businessid1", "full_address": "full address", "schools": ["school1","school2"], "open": true, "categories":["category1", "category2"], "photo_url": "http://photourl.com/photo.gif", "city": "city", "review_count": 2, "name": "name", "neighborhoods": ["neighborhood1","neighborhood2"], "url": "http://url.com/xyz", "longitude": -80.488823999999994, "state": "CA", "stars": 4.0, "latitude": 43.449645199999999, "type": "xyz"}
raw_data = load 'business.json' using com.twitter.elephantbird.pig.load.JsonLoader() as (json: map[]);
keys = FOREACH raw_data GENERATE json#'business_id' as business_id:chararray,json#'full_address' as full_address:chararray,json#'schools' as schools:chararray,json#'open' as open:chararray,json#'categories' as categories:chararray,json#'photo_url' as photo_url:chararray,json#'city' as city:chararray,(int)json#'review_count' as review_count,json#'neighborhoods' as neighborhoods:chararray,json#'url' as url:chararray,json#'longitude' as longitude:chararray,json#'state' as state:chararray,(float)json#'stars' as stars,json#'latitude' as latitude:chararray,json#'type' as type:chararray;
STORE keys INTO 'hbase://business' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('info:full_address info:schools info:open info:categories info:photo_url info:city info:review_count info:name info:neighborhoods info:url info:longitude info:state info:stars info:latitude info:type');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment