Skip to content

Instantly share code, notes, and snippets.

View mrchristine's full-sized avatar

Miklos C mrchristine

View GitHub Profile
@mrchristine
mrchristine / aws_spot_pricing.sh
Last active August 12, 2016 20:41
AWS Spot Pricing History
#!/bin/bash
# catch ctrl+c handler
trap ctrl_c_cleanup INT
function ctrl_c_cleanup() {
echo "** Interrupt handler caught"
rm -rf spot_prices_*.json
}
@mrchristine
mrchristine / dbc_reset_scheduled_jobs.sh
Created June 29, 2016 19:55
Databricks Rest API to delete job schedules.
#!/bin/bash
# catch ctrl+c handler
trap ctrl_c_cleanup INT
function ctrl_c_cleanup() {
echo "** Interrupt handler caught"
rm -rf $job_file
}
@mrchristine
mrchristine / dbc_deploy_cluster.sh
Created July 13, 2016 15:12
Deploy cluster on Databricks using REST Api
#!/bin/bash
IFS=$'\n' # make newlines the only separator
while getopts ":o" opt; do
case $opt in
o)
ondemand=true
echo -e "Deploying on-demand cluster for mwc\n"
;;
@mrchristine
mrchristine / aws_assign_eip.sh
Created September 6, 2016 18:19
Assign an elastic ip with a bootstrap script.
#/bin/bash
# Set Params
k=YOUR_AWS_KEYS
s=YOU_AWS_SECRETE
r=YOUR_REGION
# Assign EIP ID
eip_id=eipalloc-XXXXXXX
# Install awscli
@mrchristine
mrchristine / dbc_deploy_cluster_and_execute.sh
Created November 10, 2016 15:33
Databricks REST API to deploy an Apache Spark cluster and run a remote context to execute commands on the cluster.
#!/bin/bash
IFS=$'\n' # make newlines the only separator
while getopts ":p" opt; do
case $opt in
p)
print_versions=true
echo -e "Printing the spark verions and node types supported\n"
;;
@mrchristine
mrchristine / spark-submit-example-with-history.sh
Last active September 12, 2017 15:38
Databricks Rest API spark-submit w/ run-now
#!/bin/bash
usage="Add jars to the input arguments to specify the spark job. -h list the supported spark versions"
RUNTIME_VERSION="3.2.x-scala2.11"
NODE_TYPE="r3.xlarge"
while getopts ':hs:' option; do
case "$option" in
h) echo "$usage"
@mrchristine
mrchristine / spark-submit-run-once.sh
Created September 6, 2017 17:03
spark-submit transient run example
#!/bin/bash
usage="Add jars to the input arguments to specify the spark job. -h list the supported spark versions"
RUNTIME_VERSION="3.2.x-scala2.11"
NODE_TYPE="r3.xlarge"
while getopts ':hs:' option; do
case "$option" in
h) echo "$usage"
@mrchristine
mrchristine / vector_sum_udaf.scala
Created November 29, 2017 21:39
Spark UDAF to sum vectors for common keys
package com.databricks.example.pivot
/**
This code allows a user to add vectors together for common keys.
The code in the comments show you how to register the scala UDAF to be called from pyspark.
The UDAF can only be called from a SQL expression (aka spark.sql() or df.expr() )
**/
/**
# Python code to register a scala UDAF
@mrchristine
mrchristine / update_legacy_job_templates.py
Last active November 6, 2018 16:59
Job to update legacy instance types on Databricks
import json, pprint, requests, datetime
################################################################
## Replace the token variable and environment url below
################################################################
# Helper to pretty print json
def pprint_j(i):
print json.dumps(i, indent=4, sort_keys=True)
@mrchristine
mrchristine / spark_schema_save_n_load.py
Created May 28, 2019 21:12
Read / Write Spark Schema to JSON
##### READ SPARK DATAFRAME
df = spark.read.option("header", "true").option("inferSchema", "true").csv(fname)
# store the schema from the CSV w/ the header in the first file, and infer the types for the columns
df_schema = df.schema
##### SAVE JSON SCHEMA INTO S3 / BLOB STORAGE
# save the schema to load from the streaming job, which we will load during the next job
dbutils.fs.rm("/home/mwc/airline_schema.json", True)
with open("/dbfs/home/mwc/airline_schema.json", "w") as f: