Skip to content

Instantly share code, notes, and snippets.

View robcowie's full-sized avatar

Rob Cowie robcowie

  • Recycleye
  • Leeds/London, United Kingdom
View GitHub Profile
@robcowie
robcowie / delete_table_from_metastore.sql
Last active February 28, 2024 20:19
Drop a table form the hive metastore
# DELETE A TABLE IN THE HIVE METASTORE
# BE CAREFUL! BACKUP THE DB BEFORE PROCEEDING!
set @table_name = '';
SELECT @tbl_id := TBl_ID FROM TBLS WHERE TBL_NAME = @table_name;
-- Delete partition key vals
DELETE pvk
FROM PARTITION_KEY_VALS pvk
@robcowie
robcowie / hdfs_usage_notes.md
Last active February 12, 2017 14:59
HDFS Usage Notes

HDFS Notes

Moving files to and from HDFS

hdfs dfs -copyFromLocal file hdfs://path/to/dir/file
hdfs dfs -copyToLocal hdfs://path/to/dir/file file

Moving files within the cluster

@robcowie
robcowie / ip_to_numeric.py
Last active February 28, 2017 22:13
IP to Numeric with obfuscation
example1 = 1118363648
def numeric_to_ip(ip):
parts = []
while ip:
parts.append(ip & 255)
ip = ip >> 8
return '.'.join([str(p) for p in reversed(parts)])
@robcowie
robcowie / notes_oozie.md
Created January 15, 2017 12:29
Oozie notes & links
import logging
import time
logger = logging.getLogger(__name__)
class NoSuchActivityError(Exception):
pass
@robcowie
robcowie / spark_s3_fs_notes.md
Created June 13, 2016 08:41
Configure S3 filesystem support for Spark on OSX

Configure S3 filesystem support for Spark on OSX

Homebrew installed Spark 1.6.x

  1. Install hadoop to get the required jars (brew install hadoop)
  2. Create a spark-env.sh (cp /usr/local/Cellar/apache-spark/1.6.1/libexec/conf/spark-env.sh.template /usr/local/Cellar/apache-spark/1.6.1/libexec/conf/spark-env.sh)
  3. Set HADOOP_CONF_DIR in spark-env.sh (export HADOOP_CONF_DIR=/usr/local/Cellar/hadoop/2.7.2/libexec/etc/hadoop/)
  4. Add the required jars to the SPARK_CLASSPATH in spark-env.sh
@robcowie
robcowie / emr_notes.md
Last active March 17, 2016 12:09
EMR & EC2 Notes

EMR

Submit an S3DistCP step using awscli

# AMI 3.x
aws emr add-steps\
  --cluster-id j-xxxxxxxxxxx\
  --steps 'Type=CUSTOM_JAR,Name="s3distcpstep",Jar=/home/hadoop/lib/emr-s3distcp-1.0.jar,Args=["--src=s3://","--dest=s3://"]'
@robcowie
robcowie / spot_prices.py
Created January 14, 2016 15:53
Very rough script to show EC2 spot prices
# -*- coding: utf-8 -*-
"""
Requires aws and spark (the sparkline charting tool, not apache spark)
brew install awscli spark
"""
import argparse
import datetime as dt
import json
import operator as op
# g = nested_getter(1, 1, 1)
# g((0, (0, (0, 1)))) -> 1
def nested_getter(*args):
def getter(seq):
for func in (op.itemgetter(a) for a in args):
@robcowie
robcowie / emr2.py
Created December 3, 2015 11:05
boto3_driver
import boto3
# http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-submit-step.html#dynamic-configuration
# https://github.com/grafke/Drone-workflow-controller/blob/7f40968f4164aede4e67070f5a4c0894dcc6d776/drone/actions/emr_launcher.py
# https://boto3.readthedocs.org/en/latest/reference/services/emr.html#EMR.Client.run_job_flow
# http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-spark-submit-step.html#dynamic-configuration
INSTANCE_CONFIG = {
'InstanceGroups': [
{