Skip to content

Instantly share code, notes, and snippets.

@jorisbontje
Created February 28, 2013 12:57
Show Gist options
  • Save jorisbontje/5056544 to your computer and use it in GitHub Desktop.
Save jorisbontje/5056544 to your computer and use it in GitHub Desktop.
Hive Avro.txt
0) Download avro-tools jar file from avro.apache.org
1) Extract Avro schema using avro-tools.jar
java -jar avro-tools*.jar getschema file.avro > file.avsc
2) Upload Avro schema to hdfs
hadoop fs -cp file.avsc /use/training/file.avsc
3) Upload data to hdfs
hadoop fs -cp file.avro /use/training/file_avro/file.avro
4) Create Hive schema
DROP TABLE IF EXISTS movies_avro;
CREATE EXTERNAL TABLE file_avro
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/training/file_avro'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/training/file.avsc');
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment