Skip to content

Instantly share code, notes, and snippets.

View BrendanSchell's full-sized avatar

Brendan Schell BrendanSchell

  • Toronto, Canada
View GitHub Profile
@BrendanSchell
BrendanSchell / run_and_distribute_spark_env.sh
Last active April 6, 2020 14:36
Runs and distributes an activated python environment to the spark cluster
# set spark home to EMR default
export SPARK_HOME='/usr/lib/spark/'
# if want to use jupyter instead can specify "jupyter" as first arg
if [ -z $1]; then
driver_python=$(which python)
else
driver_python=$1
fi
# ex: if $1 is jupyter, then options in {"notebook","lab"}
@BrendanSchell
BrendanSchell / run_custom_python_env_spark.py
Last active April 27, 2020 11:38
Run an activated python env on spark cluster
import os
import argparse
import sys
import pathlib
import shutil
import logging
import subprocess
"""This is a program to start a spark application using the activated
python environment on the cluster
@BrendanSchell
BrendanSchell / git_ssh_config.sh
Created April 27, 2020 12:07
Set up ssh to allow git
# update ssh config to allow git
echo "Host *
Hostname ssh.github.com
Port 443
AddKeysToAgent yes
IgnoreUnknown UseKeychain
UseKeychain yes
IdentityFile ~/.ssh/id_rsa" >~/.ssh/config
@BrendanSchell
BrendanSchell / download_gists.sh
Last active April 27, 2020 12:23
download gists for jupyter notebook setup tutorial
# anaconda setup script
wget https://gist.githubusercontent.com/BrendanSchell/b2679700e90aa6bd34baf40604ee0c35/raw -O conda_environment_setup_script.sh
# jupyter on spark
wget https://gist.githubusercontent.com/BrendanSchell/99016af3ad34adaadc534a0716c72fcb/raw -O run_custom_spark_env.py