This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
data = pd.read_csv('data path url') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<configuration> | |
<property> | |
<name>hadoop.tmp.dir</name> | |
<value>/Users/nikki/hadoop/hdfs/tmp</value> | |
<description>A base for other temporary directories.</description> | |
</property> | |
<property> | |
<name>fs.default.name</name> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<configuration> | |
<property> | |
<name>mapred.job.tracker</name> | |
<value>localhost:9010</value> | |
</property> | |
<property> | |
<name>fs.s3a.impl</name> | |
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<configuration> | |
<property> | |
<name>dfs.replication</name> | |
<value></value> | |
</property> | |
<property> | |
<name>fs.s3a.access.key</name> | |
<value>ACCESS_KEY_HERE</value> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##Add to ~/.profile: | |
export JAVA_HOME=$(/usr/libexec/java_home) | |
export HIVE_AUX_JARS_PATH=/usr/local/Cellar/hadoop/3.1.1/libexec/share/hadoop/tools/lib/ | |
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/local/Cellar/hadoop/3.1.1/libexec/share/hadoop/tools/lib/* | |
##Note : That should work, | |
# else, if that doesn't work, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##Creating schema: | |
CREATE SCHEMA IF NOT EXISTS <schema_name>; | |
#Creating table: | |
CREATE EXTERNAL TABLE IF NOT EXISTS <schema_name.table_name> | |
(<column_name> STRING) | |
LOCATION 's3a://<your-S3-bucket>/raw/access-log/2018-12-28/'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<property> | |
<name>fs.s3a.access.key</name> | |
<value>ACCESS_KEY_HERE</value> | |
</property> | |
<property> | |
<name>fs.s3a.secret.key</name> | |
<value>ACCESS_SECRET_HERE</value> | |
</property> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import subprocess | |
import os | |
import uuid | |
def execute_local(args): | |
print('running command : %s' % ( ' '.join(args) )) | |
process = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) | |
output = process.communicate() | |
print('STDOUT:{}'.format(output)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import paramiko | |
def execute_remote(key_path, instance_ip, username, cmd_arr): | |
key = paramiko.RSAKey.from_private_key_file(key_path) | |
client = paramiko.SSHClient() | |
client.set_missing_host_key_policy(paramiko.AutoAddPolicy()) | |
# Connect/ssh to an instance | |
try: | |
# Here 'ubuntu' is user name and 'instance_ip' is public IP of EC2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from urllib.parse import urlparse | |
import boto3 | |
def split_s3_path(s3_path): | |
o = urlparse(s3_path.strip()) | |
print(o) | |
return o.netloc, o.path[1:] | |
OlderNewer