Skip to content

Instantly share code, notes, and snippets.

@markgrover
Created March 1, 2015 06:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save markgrover/86f54663ece0943bc8ed to your computer and use it in GitHub Desktop.
Save markgrover/86f54663ece0943bc8ed to your computer and use it in GitHub Desktop.
Code for sqooping into Hive parquet tables
#!/bin/bash
# This code is taken from github.com/hadooparchitecturebook/hadoop-arch-book.
SQOOP_METASTORE_HOST=localhost
sudo -u hdfs hadoop fs -mkdir -p /etl/movielens/user_rating_fact
sudo -u hdfs hadoop fs -chown -R $USER: /etl/movielens/user_rating_fact
sqoop job --delete user_rating_import --meta-connect jdbc:hsqldb:hsql://${SQOOP_METASTORE_HOST}:16000/sqoop || :
# Need to explictly export HIVE_HOME before this command if Hive is not present under /usr/lib/hive
# No need to do this if you are using Apache Sqoop 1.4.6 or later
sqoop job --create user_rating_import --meta-connect jdbc:hsqldb:hsql://${SQOOP_METASTORE_HOST}:16000/sqoop \
-- import --connect jdbc:mysql://mgrover-haa-2.vpc.cloudera.com:3306/oltp --username root \
--table user_rating -m 8 --incremental append --check-column timestamp \
--as-parquetfile --hive-import --warehouse-dir /etl/movielens/user_rating_fact --hive-table user_rating_fact
# No need to create a table explictly.
# Sqoop does that for ya.
#Create user_rating_fact table in hive
#hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS user_rating_fact(id INT, timestamp TIMESTAMP,
# user_id INT, movie_id INT, rating INT)
#ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
#STORED AS
#INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
#OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
#LOCATION '/etl/movielens/user_rating_fact'"
sqoop job -exec user_rating_import --meta-connect jdbc:hsqldb:hsql://${SQOOP_METASTORE_HOST}:16000/sqoop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment