TO IMPORT & EXPORT DATA FROM RDBMS (MYSQL,ORACLE, etc) INTO HDFS / HIVE / HBASE
Pre-requisite
Apache Hadoop
Apache Sqoop (compatible with Hadoop version)
Apache Hive (optional)
Apache HBase (optional)
Apache HCatalog (optional)
JDBC/ODBC connector
For all RDBMS, Connection URL changes and remaining all command line arguments remains same. You need to download specific JDBC/ODBC connector JAR and copy it to $SQOOP_HOME/lib
MySQL
Download mysql-connector-java.jar
and place in $SQOOP_HOME/lib
folder
cp mysql-connector-java-5.1.18-bin.jar /usr/local/hadoop/sqoop-1.4.3-cdh4.4.0/lib/
Expecting you have data in MySQL tables.
Retrieving list of Databases available in MySQL from SQOOP
sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root -P
- MySQL to HDFS Import
Have Primary key:
sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName
No Primary key:
sqoop import -connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --target-dir /path/to/directoryName -m 1
- MySQL to Hive Import
Have Primary key:
sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home
No Primary key:
sqoop-import --connect jdbc:mysql://localhost:3306/db1 -username root -password password --table tableName --hive-table tableName --create-hive-table --hive-import --hive-home path/to/hive_home -m 1
- MySQL to HBase Import
Have Import All columns:
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-create-table
HBase import few columns
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col1 --hbase-create-table
To HBase with Primary key:
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --column-family hbase_table_col1 --hbase-row-key column1 –hbase-create-table
To Hbase with no primary key:
sqoop import --connect jdbc:mysql://localhost:3306/db1 --username root --password root --table tableName --hbase-table hbase_tableName --columns column1,column2 --column-family hbase_table_col --hbase-row-key column1 --hbase-create-table
- Export from HDFS to MySQL:
Same for all Hive/HBase/HDFS: Because Hive tables are nothing but directories in HDFS. So you're just exporting a directory to MySQL
sqoop export --connect jdbc:mysql://localhost:3306/test_db --table tableName --export-dir /user/hive/warehouse/tableName --username root --password password -m 1 --input-fields-terminated-by '\001'
SQL Server
Connection URL:
sqoop import --connect 'jdbc:sqlserver://<IP(or)hostname>;username=dbuser;password=dbpasswd;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 1
Download Connector from Microsoft website
http://www.microsoft.com/en-us/download/confirmation.aspx?id=11774
Place it in $SQOOP_HOME/lib
Oracle
Connection URL:
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=myhost)(port=1521))(connect_data=(service_name=myservice)))" \
--username USER --table SCHEMA.TABLE_NAME --hive-import --hive-table SCHEMA.TABLE_NAME \
--num-mappers 1 --verbose -P \
IBM DB2
Download the DB2Driver and place it in $SQOOP_HOME/lib
sqoop import --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/testdb --username db2user --db2pwd --table db2tbl --split-by tbl_primarykey --target-dir sqoopimports
sqoop export --driver com.ibm.db2.jcc.DB2Driver --connect jdbc:db2://db2.my.com:50000/myDB --username db2user --password db2pwd --table db2tbl --export-dir /sqoop/dataFile.csv
Different Connection Strings for Different RDBMS
Database version --direct support? connect string matches
HSQLDB 1.8.0+ No jdbc:hsqldb:*//
MySQL 5.0+ Yes jdbc:mysql://
Oracle 10.2.0+ No jdbc:oracle:*//
PostgreSQL 8.3+ Yes (import only) jdbc:postgresql://