Skip to content

Instantly share code, notes, and snippets.

View mrchristine's full-sized avatar

Miklos C mrchristine

View GitHub Profile
@mrchristine
mrchristine / dbc_deploy_cluster.sh
Created July 13, 2016 15:12
Deploy cluster on Databricks using REST Api
#!/bin/bash
IFS=$'\n' # make newlines the only separator
while getopts ":o" opt; do
case $opt in
o)
ondemand=true
echo -e "Deploying on-demand cluster for mwc\n"
;;
@mrchristine
mrchristine / dbc_reset_scheduled_jobs.sh
Created June 29, 2016 19:55
Databricks Rest API to delete job schedules.
#!/bin/bash
# catch ctrl+c handler
trap ctrl_c_cleanup INT
function ctrl_c_cleanup() {
echo "** Interrupt handler caught"
rm -rf $job_file
}
@mrchristine
mrchristine / aws_spot_pricing.sh
Last active August 12, 2016 20:41
AWS Spot Pricing History
#!/bin/bash
# catch ctrl+c handler
trap ctrl_c_cleanup INT
function ctrl_c_cleanup() {
echo "** Interrupt handler caught"
rm -rf spot_prices_*.json
}
@mrchristine
mrchristine / spark-submit-run-once.sh
Created September 6, 2017 17:03
spark-submit transient run example
#!/bin/bash
usage="Add jars to the input arguments to specify the spark job. -h list the supported spark versions"
RUNTIME_VERSION="3.2.x-scala2.11"
NODE_TYPE="r3.xlarge"
while getopts ':hs:' option; do
case "$option" in
h) echo "$usage"
@mrchristine
mrchristine / spark-submit-example-with-history.sh
Last active September 12, 2017 15:38
Databricks Rest API spark-submit w/ run-now
#!/bin/bash
usage="Add jars to the input arguments to specify the spark job. -h list the supported spark versions"
RUNTIME_VERSION="3.2.x-scala2.11"
NODE_TYPE="r3.xlarge"
while getopts ':hs:' option; do
case "$option" in
h) echo "$usage"
@mrchristine
mrchristine / update_legacy_job_templates.py
Last active November 6, 2018 16:59
Job to update legacy instance types on Databricks
import json, pprint, requests, datetime
################################################################
## Replace the token variable and environment url below
################################################################
# Helper to pretty print json
def pprint_j(i):
print json.dumps(i, indent=4, sort_keys=True)
@mrchristine
mrchristine / spark_stuff.scala
Created June 7, 2019 15:53
Spark Notes / Tips to Remember
spark.conf.isModifiable("spark.sql.shuffle.partitions")
@mrchristine
mrchristine / get_s3_storage_costs.sh
Created December 11, 2019 15:33
Calculate S3 costs for Storage
#!/bin/bash
# get the last date in the file
last_date=`cat $@ | awk -F',' '{print $5}' | awk '{print $1}' | grep -v "Start" | sort | uniq | tail -n1`
# pass in the report.csv and calculate total storage costs for StandardStorage tier
cat "$@" | grep $last_date | awk -F, '{printf "%.2f GB %s %s \n", $7/(1024**3 )/24, $4, $2}' | grep "StandardStorage" | uniq | sort -n
echo "Processed for $last_date"
@mrchristine
mrchristine / iam.py
Created January 15, 2020 16:47
Bypass IAM Check
import requests
token = 'MYTOKEN'
url = 'https://EXAMPLE.cloud.databricks.com'
ip = 'arn:aws:iam::123456789:instance-profile/databricks_special_role'
class DatabricksRestClient:
"""A class to define wrappers for the REST API"""
@mrchristine
mrchristine / dbc_deploy_cluster_and_execute.sh
Created November 10, 2016 15:33
Databricks REST API to deploy an Apache Spark cluster and run a remote context to execute commands on the cluster.
#!/bin/bash
IFS=$'\n' # make newlines the only separator
while getopts ":p" opt; do
case $opt in
p)
print_versions=true
echo -e "Printing the spark verions and node types supported\n"
;;