Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save saikocat/d7f766a536a38d81136c to your computer and use it in GitHub Desktop.
Save saikocat/d7f766a536a38d81136c to your computer and use it in GitHub Desktop.
Submit Pig job to remote secured Hadoop cluster (MIT Kerberos)

Generate Your Own Keytab

Login to your remote-clusters and run ktutil. In this example, username is hoa and realm is SAIKOCAT.COM

[hoa@remote-clusters ~]$ ktutil
ktutil:  addent -password -p hoa@SAIKOCAT.COM -k 1 -e aes256-cts
Password for hoa@SAIKOCAT.COM: 
ktutil:  addent -password -p hoa@SAIKOCAT.COM -k 1 -e rc4-hmac
Password for hoa@SAIKOCAT.COM: 
ktutil:  wkt hoa.keytab
ktutil:  quit

[hoa@remote-clusters ~]$ chmod 0700 hoa.keytab

Download the newly created keytab to your local machine

[hoa@local ~]$ scp hoa@remote-clusters:~/hoa.keytab ~/

Set-up your krb5 clients (krb5-clients, krb5-users, etc. depending on your distributions)

See [krb5.conf] file.

Obtain Hadoop configurations from your cluster administration and save to say '~/hadoop-conf/'.

See [Remote-Hadoop-cluster-conf-files.out] file.

Authenticate yourself with the cluster (also needed when your ticket_lifetime is expired)

[hoa@local ~]$ kinit hoa@SAIKOCAT.COM -k -t ~/hoa.keytab

Then after you have set up your Pig. You can now send job remotely

[hoa@local ~]$ export HADOOP_CONF_DIR="/home/hoa/hadoop-conf/"
[hoa@local ~]$ pig -P ./pig.properties remote.pig

Cheers

# Local "/etc/krb5.conf"
# s/SAIKOCAT.COM/your.realm/g
# s/saikocat.com/your.realm/g
[libdefaults]
default_realm = SAIKOCAT.COM
forwardable = true
[realms]
SAIKOCAT.COM = {
kdc = kerberos.saikocat.com:88
admin_server = kerberos.saikocat.com:749
default_domain = SAIKOCAT.COM
}
[domain_realm]
.saikocat.com = SAIKOCAT.COM
saikocat.com = SAIKOCAT.COM
# For 'log4jconf' property: '/home/hoa/pig/log4j.properties' in 'pig.properties' above
# ***** Set root logger level to WARN and its only appender to A.
log4j.logger.org.apache.pig=WARN, A
# log4j.logger.org.apache.hadoop=WARN, A # if don't want to see Hadoop log
# ***** A is set to be a ConsoleAppender.
log4j.appender.A=org.apache.log4j.ConsoleAppender
# ***** A uses PatternLayout.
log4j.appender.A.layout=org.apache.log4j.PatternLayout
log4j.appender.A.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
# See more options @ https://svn.apache.org/repos/asf/pig/trunk/conf/pig.properties
# Don't litter the pig logfile
pig.logfile=/tmp/pig-err.log
# Let not cluster the screen with pig INFO
log4jconf=/home/hoa/pig/log4j.properties
pig.pretty.print.schema=true
exectype=mapreduce
# ls -R ~/hadoop-conf
hadoop-conf:
core-site.xml hbase hdfs-site.xml mapred-site.xml yarn-site.xml zookeeper
hadoop-conf/hbase:
configuration.xsl hbase-site.xml jaas.conf java.env log4j.properties zoo.cfg
hadoop-conf/zookeeper:
configuration.xsl jaas.conf java.env log4j.properties zoo.cfg
-- Note: this lib is local to your system. Great for local development of UDFs, isn't it?
-- You can even integrate with Gradle or other build tool to keep track of dependencies
register '/home/hoa/pig/lib/parquet-pig-bundle-1.5.0.jar'
-- Note: but these are on the hdfs.
register 'hdfs:///user/hoa/libs/datafu-1.2.0.jar'
register 'hdfs:///user/hoa/libs/elasticsearch-hadoop-2.0.2.jar'
-- Note: '/user/hoa/' is actually on the cluster hdfs
cat_mapping = LOAD '/user/hoa/LOOKUP/CAT_MAPPING_*' USING parquet.pig.ParquetLoader();
DESCRIBE cat_mapping;
DUMP cat_mapping;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment