Skip to content

Instantly share code, notes, and snippets.

@krishna209
Created April 30, 2015 04:40
Show Gist options
  • Save krishna209/5ab46942f7f627aa8144 to your computer and use it in GitHub Desktop.
Save krishna209/5ab46942f7f627aa8144 to your computer and use it in GitHub Desktop.
Step 1: download avro tool jar
wget http://mirrors.sonic.net/apache/avro/avro-1.7.7/java/avro-tools-1.7.7.jar
Step 2: Generate schema
java -jar avro-tools-1.7.7.jar getschema /home/hdfs/genre1/part-m-00000.avro
Step 3:
sqoop import --connect jdbc:mysql://172.16.2.164/movielens --username hive -P --table genre --as-avrodatafile
this imports genre data from mysql to hdfs as .avro files and generates .avsc schema in local filesystem
data files need to be in hdfs but .avsc file can be either in local or hdfs
Step 4: creating a hive table for .avro data file
create external table genre row format serde 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' stored as inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location '/user/hdfs/genre' tblproperties('avro.schema.url'='hdfs:///user/hdfs/genre.avsc');
Step 5:
in the genre directory except _SUCCESS and part files nothing should be kept.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment