Skip to content

Instantly share code, notes, and snippets.

@cjmatta
Last active May 25, 2024 04:53
Show Gist options
  • Save cjmatta/554a31635d52a047367b to your computer and use it in GitHub Desktop.
Save cjmatta/554a31635d52a047367b to your computer and use it in GitHub Desktop.
Teradata Notes

##Teradata TDCH and Parallel Transporter with MapR

##Environment Running on Peep.local: 192.168.1.26

###Tasks

  • Load sample data into Teradata database
  • Run queries to ensure it's in there
  • Move data using TDCH
  • Move data using TD Parallel transporter

###log-synth schema Here's the schema I'm using for log-synth:

{
  {"name": "id", "class": "id"},
  {"name": "full_name", "class": "name", "type": "first_last"},
  {"name": "address", "class": "address"},
  {"name": "state", "class": "state"}
}

To generate 1000 rows: $ ./synth -count 1000 -schema users-schema.json -format CSV -quote OPTIMISTIC > users.csv

Teradata notes

express user guide: http://developer.teradata.com/database/articles/teradata-express-14-0-for-vmware-user-guide

web interface: http://192.168.1.26/c u/p: admin/teradata Use bteq for interactive sql .logon 127.0.0.1/dbc Password: dbc

####Create user table

CREATE TABLE vmtest.users,
NO FALLBACK, 
NO BEFORE JOURNAL, 
NO AFTER JOURNAL,
CHECKSUM = DEFAULT ( 
id INTEGER NOT NULL,
name VARCHAR(128) NOT NULL,
address VARCHAR(256) NOT NULL,
state VARCHAR(64) NOT NULL
) PRIMARY INDEX ( id );

script:

bteq<<EOF
.LOGON 127.0.0.1/dbc,dbc;
DROP TABLE vmtest.users;
CREATE TABLE vmtest.users,
NO FALLBACK, 
NO BEFORE JOURNAL, 
NO AFTER JOURNAL,
CHECKSUM = DEFAULT ( 
id INTEGER NOT NULL,
name VARCHAR(128) NOT NULL,
address VARCHAR(256) NOT NULL,
state VARCHAR(64) NOT NULL
) PRIMARY INDEX ( id );
EOF

####tdload /opt/teradata/client/14.10/tbuild/bin/tdload -f ~/users.csv -u vmtest -p vmtest -h 127.0.0.1 -t vmtest.users

Seems like there are a bunch of databases made when you use tdload and get an error, when there's an error you should remove all of them:

#!/bin/sh
bteq<<EOF
.LOGON 127.0.0.1/dbc,dbc;
drop table vmtest.users;
drop table vmtest.users_ET;
drop table vmtest.users_Log;
drop table vmtest.users_UV;
EOF

Show database information: help database <dbname>; Drop all tables associated with the erroring table

Ensure that there are no 'continue files' in: /opt/teradata/client/14.10/tbuild/checkpoint/root

Make sure that string fields aren't quoted from input data

need to specify format for date: MM/DD/YYYY

####Connectivity from Teradata to MapR Ensure the NFS mount can be seen from Teradata:

# showmount -e 192.168.1.32
Export list for 192.168.1.32:
/mapr               *
/mapr/demo.mapr.com *

Mount the NFS share:

$ mount -o hard,rw,nolock,intr 192.168.1.32:/mapr /mapr

####TDCH TDCH Tutorial

Using teradata connector 1.3.2 (downloaded the HW one): download here

Install on MapR nodes, use this script after rpm install to move files to the correct locations:

#!/bin/bash
TDCH_INSTALL=$(ls -ld /usr/lib/tdch/* | awk '{print $NF}');
if [[ ! -d $TDCH_INSTALL ]]
then
	echo "TDCH not installed!";
	exit 1;
fi

HADOOP_INSTALL=$(ls -ld /opt/mapr/hadoop/hadoop-* | awk '{print $NF}');

if [[ $? -ne 0 ]]
then
	echo "Couldn't find hadoop install!";
	exit 1;
fi

echo "Copying files into ${HADOOP_INSTALL}/lib..."
cp $TDCH_INSTALL/lib/*.jar $HADOOP_INSTALL/lib/
chmod 644 $HADOOP_INSTALL/lib/tdgssconfig.jar
chmod 644 $HADOOP_INSTALL/lib/teradata-connector-1.3.2.jar
chmod 644 $HADOOP_INSTALL/lib/terajdbc4.jar


echo "Copying config files into ${HADOOP_INSTALL}/conf..."
cp $TDCH_INSTALL/conf/teradata-export-properties.xml.template $HADOOP_INSTALL/conf/teradata-export-properties.xml
chown mapr:root $HADOOP_INSTALL/conf/teradata-export-properties.xml
chmod 644 $HADOOP_INSTALL/conf/teradata-export-properties.xml

cp $TDCH_INSTALL/conf/teradata-import-properties.xml.template $HADOOP_INSTALL/conf/teradata-import-properties.xml
chown mapr:root $HADOOP_INSTALL/conf/teradata-import-properties.xml
chmod 644 $HADOOP_INSTALL/conf/teradata-import-properties.xml

Edit /opt/mapr/hadoop/hadoop-0.20.2/conf/mapred-site.xml:

<property>
 <name>mapred.child.java.opts</name>
 <value>-Djava.security.egd=file:/dev/./urandom</value>
</property>

Copy table to local directory with TDCH:

$ hadoop com.teradata.hadoop.tool.TeradataImportTool -url jdbc:teradata://192.168.1.26/database=vmtest -username vmtest -password vmtest -classname com.teradata.jdbc.TeraDriver -fileformat textfile -jobtype hdfs -method split.by.hash -targetpaths /user/mapr/import_sample -nummappers 1 -sourcetable users

Move resulting file to a hive dir:

$ mv /mapr/demo.mapr.com/user/mapr/import_sample/part-m-00000 /mapr/demo.mapr.com/user/mapr/hive/users.tsv

Create external Hive table

$ hive
hive> create external table users (
id INT,
name STRING,
address STRING,
state STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/user/mapr/hive/users.tsv';

Query to ensure the data is there:

hive> select * from users limit 10;
OK
469	Marsha Sterling	2594 Lazy Goose Glade	pa
0	Robert Davis	45551 Windy Flower Swale	ny
938	Mae Graham	76831 Little Embers Journey	nj
265	Sandra Beaudoin	48189 Easy Forest Acres	tn
61	Dean Long	29964 Fallen Pony Falls	tx
734	Ronnie Freed	95785 Gentle Oak Woods	wa
326	James Reyes	2768 Pied Bird Lagoon	al
530	Kenneth Wolbert	48542 Square Rabbit Dale	oh
795	Kenneth Chamblee	32702 Turning Pony Gate	pa
999	Heather Gamez	88053 Pied Treasure Pass	fl
Time taken: 0.398 seconds, Fetched: 10 row(s)
hive>

Transferring a table from TeraData to Hive directly

TDCH needs to be told where a bunch of jar files live on MapR:

Script:

#!/bin/bash

export HIVE_HOME=/opt/mapr/hive/hive-0.13
export HCAT_HOME=$HIVE_HOME/hcatalog

export LIB_JARS=$HCAT_HOME/share/hcatalog/hive-hcatalog-core-0.13.0-mapr-1405.jar,$HIVE_HOME/lib/hive-metastore-0.13.0-mapr-1405.jar,$HIVE_HOME/lib/libthrift-0.9.0.jar,$HIVE_HOME/lib/hive-exec-0.13.0-mapr-1405.jar,$HIVE_HOME/lib/libfb303-0.9.0.jar,$HIVE_HOME/lib/jdo-api-3.0.1.jar,$HIVE_HOME/lib/hive-cli-0.13.0-mapr-1405.jar

export HADOOP_CLASSPATH=$HCAT_HOME/share/hcatalog/hive-hcatalog-core-0.13.0-mapr-1405.jar:$HIVE_HOME/lib/hive-metastore-0.13.0-mapr-1405.jar:$HIVE_HOME/lib/libthrift-0.9.0.jar:$HIVE_HOME/lib/hive-exec-0.13.0-mapr-1405.jar:$HIVE_HOME/lib/libfb303-0.9.0.jar:$HIVE_HOME/lib/jdo-api-3.0.1.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:$HADOOP_HOME/lib/slf4j-api-1.6.1.jar:$HIVE_HOME/lib/antlr-runtime-3.4.jar:$HIVE_HOME/lib/datanucleus-core-3.2.10.jar:$HIVE_HOME/lib/datanucleus-rdbms-3.2.9.jar:$HADOOP_HOME/lib/mysql-connector-java-5.1.25-bin.jar:$HIVE_HOME/lib/hive-cli-0.13.0-mapr-1405.jar


hadoop com.teradata.hadoop.tool.TeradataImportTool \
-libjars $LIB_JARS \
-url jdbc:teradata://192.168.1.26/database=vmtest \
-username vmtest \
-password vmtest \
-jobtype hive \
-fileformat rcfile \
-sourcetable users \
-sourcefieldnames "id,name,address,state" \
-nummappers 1 \
-targetdatabase default \
-targettable users_tdch \
-targettableschema "id int, name string, address string, state string" \
-targetfieldnames "id,name,address,state"
[root@maprdemo hadoop-0.20.2]# hadoop com.teradata.hadoop.tool.TeradataImportTool
14/09/30 06:16:20 INFO tool.ConnectorImportTool: ConnectorImportTool starts at 1412082980438
hadoop jar teradata-hadoop-connector.jar
com.teradata.connector.common.tool.ConnectorImportTool
[-conf <conf file>] (optional)
[-jobtype <job type>] (values: hdfs, hive, and hcat, default is hdfs)
[-fileformat <file format>] (values: sequencefile, textfile, avrofile, orcfile and rcfile, default is textfile)
[-classname <classname>] (optional)
[-url <url>] (optional)
[-username <username>] (optional)
[-password <password>] (optional)
[-batchsize <batchsize>] (optional, default value is 10000)
[-accesslock <true|false>] (optional, default value is false)
[-queryband <queryband>] (optional)
[-targetpaths <path>] (optional, applicable for hdfs and hive jobs)]
[-sourcetable <tablename>] (optional, use -sourcetable or -sourcequery but not both)
[-sourceconditions <conditions>] (optional, use with -sourcetable option)
[-sourcefieldnames <fieldnames>] (optional, comma delimited format)
[-sourcerecordschema <record schema>] (optional, comma delimited format)
[-targetrecordschema <record schema>] (optional, comma delimited format)
[-sourcequery <query>] (optional, use either -sourcetable or -sourcequery but not both)
[-sourcecountquery <countquery>] (optional, use with -sourcequery option)
[-targetdatabase <database>] (optional)
[-targettable <table>] (optional)
[-targetfieldnames <fields>] (optional, comma separated format
[-targettableschema <schema>] (optional, comma separated format
[-targetpartitionschema <schema>] (optional, comma separated format, used with -targettableschema only
[-separator <separator>] (optional,used to separate fields in text)
[-lineseparator <lineseparator>] (optional, used to separate different lines, only useful for hive and hcat job)
[-enclosedby <enclosed-by-character> (optional, used to enclose text, only useful for hdfs job)]
[-escapedby <escaped-by-character> (optional, used to escape special characters, only useful for hdfs job)]
[-nullstring <string>] (optional, a string to replace null value of string type)
[-nullnonstring <string>] (optional, a string to replace null value of non-string type, only useful for hdfs job)
[-method <method>] (optional import method, values: split.by.partition, split.by.hash, split.by.value and split.by.amp only for Teradata version 14.10.00.02, default is split.by.hash)
[-nummappers <num>] (optional, default is 2)
[-splitbycolumn <column name>] (optional for split.by.hash and split.by.value methods)
[-forcestage <true>] (optional force to use stage table, default is false)
[-stagetablename <table name>] (optional, stage table name should be 20 characters or less)
[-stagedatabase <database>] (optional)
[-numpartitionsinstaging <num>] (optional)
[-hiveconf <target path>] (optional, required for hive and hcat jobs launched on non-name nodes)]
[-usexview <true|false>] (optional, default is true)
[-avroschema <avro schema>] (optional, an inline Avro schema definition)
[-avroschemafile <path>] (optional, a file path for Avro schema definition)
[-debugoption <debug option value>] (optional, a debug option to close some stages of the whole hadoop job)
[-sourcedateformat <date format>] (optional, a default date format for all converters that convert string to date)
[-targetdateformat <date format>] (optional, a default date format for all converters that convert date to string )
[-sourcetimeformat <time format>] (optional, a default time format for all converters that convert string to time)
[-targettimeformat <time format>] (optional, a default time format for all converters that convert time to string )
[-sourcetimestampformat <timestamp format>] (optional, a default timestamp format for all converters that convert string to timestamp)
[-targettimestampformat <timestamp format>] (optional, a default timestamp format for all converters that convert timestamp to string )
[-sourcetimezoneid <timezone id>] (optional, a default input timezone id for all converters that convert timestamp to another timestamp
[-targettimezoneid <timezone id>] (optional, a default output timezone id for all converters that convert timestamp to another timestamp
[-h|help] (optional)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment