##Teradata TDCH and Parallel Transporter with MapR
##Environment Running on Peep.local: 192.168.1.26
###Tasks
- Load sample data into Teradata database
- Run queries to ensure it's in there
- Move data using TDCH
- Move data using TD Parallel transporter
###log-synth schema Here's the schema I'm using for log-synth:
{
{"name": "id", "class": "id"},
{"name": "full_name", "class": "name", "type": "first_last"},
{"name": "address", "class": "address"},
{"name": "state", "class": "state"}
}
To generate 1000 rows:
$ ./synth -count 1000 -schema users-schema.json -format CSV -quote OPTIMISTIC > users.csv
express user guide: http://developer.teradata.com/database/articles/teradata-express-14-0-for-vmware-user-guide
web interface: http://192.168.1.26/c u/p: admin/teradata
Use bteq
for interactive sql
.logon 127.0.0.1/dbc
Password: dbc
####Create user table
CREATE TABLE vmtest.users,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT (
id INTEGER NOT NULL,
name VARCHAR(128) NOT NULL,
address VARCHAR(256) NOT NULL,
state VARCHAR(64) NOT NULL
) PRIMARY INDEX ( id );
script:
bteq<<EOF
.LOGON 127.0.0.1/dbc,dbc;
DROP TABLE vmtest.users;
CREATE TABLE vmtest.users,
NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT (
id INTEGER NOT NULL,
name VARCHAR(128) NOT NULL,
address VARCHAR(256) NOT NULL,
state VARCHAR(64) NOT NULL
) PRIMARY INDEX ( id );
EOF
####tdload
/opt/teradata/client/14.10/tbuild/bin/tdload -f ~/users.csv -u vmtest -p vmtest -h 127.0.0.1 -t vmtest.users
Seems like there are a bunch of databases made when you use tdload and get an error, when there's an error you should remove all of them:
#!/bin/sh
bteq<<EOF
.LOGON 127.0.0.1/dbc,dbc;
drop table vmtest.users;
drop table vmtest.users_ET;
drop table vmtest.users_Log;
drop table vmtest.users_UV;
EOF
Show database information: help database <dbname>;
Drop all tables associated with the erroring table
Ensure that there are no 'continue files' in:
/opt/teradata/client/14.10/tbuild/checkpoint/root
Make sure that string fields aren't quoted from input data
need to specify format for date: MM/DD/YYYY
####Connectivity from Teradata to MapR Ensure the NFS mount can be seen from Teradata:
# showmount -e 192.168.1.32
Export list for 192.168.1.32:
/mapr *
/mapr/demo.mapr.com *
Mount the NFS share:
$ mount -o hard,rw,nolock,intr 192.168.1.32:/mapr /mapr
####TDCH TDCH Tutorial
Using teradata connector 1.3.2 (downloaded the HW one): download here
Install on MapR nodes, use this script after rpm install to move files to the correct locations:
#!/bin/bash
TDCH_INSTALL=$(ls -ld /usr/lib/tdch/* | awk '{print $NF}');
if [[ ! -d $TDCH_INSTALL ]]
then
echo "TDCH not installed!";
exit 1;
fi
HADOOP_INSTALL=$(ls -ld /opt/mapr/hadoop/hadoop-* | awk '{print $NF}');
if [[ $? -ne 0 ]]
then
echo "Couldn't find hadoop install!";
exit 1;
fi
echo "Copying files into ${HADOOP_INSTALL}/lib..."
cp $TDCH_INSTALL/lib/*.jar $HADOOP_INSTALL/lib/
chmod 644 $HADOOP_INSTALL/lib/tdgssconfig.jar
chmod 644 $HADOOP_INSTALL/lib/teradata-connector-1.3.2.jar
chmod 644 $HADOOP_INSTALL/lib/terajdbc4.jar
echo "Copying config files into ${HADOOP_INSTALL}/conf..."
cp $TDCH_INSTALL/conf/teradata-export-properties.xml.template $HADOOP_INSTALL/conf/teradata-export-properties.xml
chown mapr:root $HADOOP_INSTALL/conf/teradata-export-properties.xml
chmod 644 $HADOOP_INSTALL/conf/teradata-export-properties.xml
cp $TDCH_INSTALL/conf/teradata-import-properties.xml.template $HADOOP_INSTALL/conf/teradata-import-properties.xml
chown mapr:root $HADOOP_INSTALL/conf/teradata-import-properties.xml
chmod 644 $HADOOP_INSTALL/conf/teradata-import-properties.xml
Edit /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml
:
<property>
<name>mapred.child.java.opts</name>
<value>-Djava.security.egd=file:/dev/./urandom</value>
</property>
Copy table to local directory with TDCH:
$ hadoop com.teradata.hadoop.tool.TeradataImportTool -url jdbc:teradata://192.168.1.26/database=vmtest -username vmtest -password vmtest -classname com.teradata.jdbc.TeraDriver -fileformat textfile -jobtype hdfs -method split.by.hash -targetpaths /user/mapr/import_sample -nummappers 1 -sourcetable users
Move resulting file to a hive dir:
$ mv /mapr/demo.mapr.com/user/mapr/import_sample/part-m-00000 /mapr/demo.mapr.com/user/mapr/hive/users.tsv
Create external Hive table
$ hive
hive> create external table users (
id INT,
name STRING,
address STRING,
state STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/user/mapr/hive/users.tsv';
Query to ensure the data is there:
hive> select * from users limit 10;
OK
469 Marsha Sterling 2594 Lazy Goose Glade pa
0 Robert Davis 45551 Windy Flower Swale ny
938 Mae Graham 76831 Little Embers Journey nj
265 Sandra Beaudoin 48189 Easy Forest Acres tn
61 Dean Long 29964 Fallen Pony Falls tx
734 Ronnie Freed 95785 Gentle Oak Woods wa
326 James Reyes 2768 Pied Bird Lagoon al
530 Kenneth Wolbert 48542 Square Rabbit Dale oh
795 Kenneth Chamblee 32702 Turning Pony Gate pa
999 Heather Gamez 88053 Pied Treasure Pass fl
Time taken: 0.398 seconds, Fetched: 10 row(s)
hive>
TDCH needs to be told where a bunch of jar files live on MapR:
Script:
#!/bin/bash
export HIVE_HOME=/opt/mapr/hive/hive-0.13
export HCAT_HOME=$HIVE_HOME/hcatalog
export LIB_JARS=$HCAT_HOME/share/hcatalog/hive-hcatalog-core-0.13.0-mapr-1405.jar,$HIVE_HOME/lib/hive-metastore-0.13.0-mapr-1405.jar,$HIVE_HOME/lib/libthrift-0.9.0.jar,$HIVE_HOME/lib/hive-exec-0.13.0-mapr-1405.jar,$HIVE_HOME/lib/libfb303-0.9.0.jar,$HIVE_HOME/lib/jdo-api-3.0.1.jar,$HIVE_HOME/lib/hive-cli-0.13.0-mapr-1405.jar
export HADOOP_CLASSPATH=$HCAT_HOME/share/hcatalog/hive-hcatalog-core-0.13.0-mapr-1405.jar:$HIVE_HOME/lib/hive-metastore-0.13.0-mapr-1405.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:$HIVE_HOME/lib/hive-exec-0.13.0-mapr-1405.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/lib/jdo-api-3.0.1.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:$HADOOP_HOME/lib/slf4j-api-1.6.1.jar:$HIVE_HOME/lib/antlr-runtime-3.4.jar:$HIVE_HOME/lib/datanucleus-core-3.2.10.jar:$HIVE_HOME/lib/datanucleus-rdbms-3.2.9.jar:$HADOOP_HOME/lib/mysql-connector-java-5.1.25-bin.jar:$HIVE_HOME/lib/hive-cli-0.13.0-mapr-1405.jar
hadoop com.teradata.hadoop.tool.TeradataImportTool \
-libjars $LIB_JARS \
-url jdbc:teradata://192.168.1.26/database=vmtest \
-username vmtest \
-password vmtest \
-jobtype hive \
-fileformat rcfile \
-sourcetable users \
-sourcefieldnames "id,name,address,state" \
-nummappers 1 \
-targetdatabase default \
-targettable users_tdch \
-targettableschema "id int, name string, address string, state string" \
-targetfieldnames "id,name,address,state"