Skip to content

Instantly share code, notes, and snippets.

View pjbull's full-sized avatar
🥦

Peter Bull pjbull

🥦
View GitHub Profile
@pjbull
pjbull / World Bank - Getting Started.ipynb
Last active April 18, 2017 21:47
World Bank - Getting Started
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@pjbull
pjbull / function
Created June 18, 2015 22:44
Reload Package in Active Development (IPython Notebook)
import sys
import types
import dev_package
def reload_package(root_module):
package_name = root_module.__name__
# get a reference to each loaded module
loaded_package_modules = dict([
@pjbull
pjbull / gist:e971e63807c67fa68263
Last active September 11, 2015 13:34
notebook header
from __future__ import division
# graphics
import matplotlib.pyplot as plt
from matplotlib import rc_params
mpl_default = rc_params()
import seaborn as sns
sns.set(rc=mpl_default)
@pjbull
pjbull / rodeo-ec2.md
Last active September 13, 2015 21:43

Running RODEO on EC2

  1. Make sure instance is allowing HTTP traffic through the SecurityGroup (Type: HTTP, Protocol: TCP, Port Range: 80, Source: 0.0.0.0/0 on EC2. Note: THIS CAN ONLY BE DONE WHEN YOU ARE SETTING UP THE IMAGE
  2. SSH into the machine: ssh -i [MYCREDS].pem ubuntu@[public ip]
  3. Install whatever packages/Python you need
  4. Redirect HTTP traffic from port 80 to another port greater than 1024. 8080 is a common choice:
sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080
@pjbull
pjbull / canonical_restaurant_names.py
Created October 9, 2015 17:04
Boston restaurants to canonical name and address
import re
import sys
import unicodedata
import pandas as pd
def clean_string(s):
if isinstance(s, unicode):
s = unicodedata.normalize('NFKD', s).encode('ascii', 'ignore')
@pjbull
pjbull / ec2me.sh
Created October 27, 2015 16:08
SSH to Running EC2 Ubuntu Instance
# I keep this in my .bashrc to get to a running EC2 instance if there is only one running with my current creds
export AWS_EC2_PEM=<PATH_TO_PEM>
function ec2me () {
running=$(ec2-describe-instances | grep running)
instances=$(wc -l <<< "$running")
if [ $instances = 1 ]; then
server=$(cut -f4 <<< "$running" )
@pjbull
pjbull / gist:9ddb9d5ee403d9730724
Last active January 1, 2016 22:52
Traceback Library
================================================
Failed to save <type 'numpy.ndarray'> to .npy file:
Traceback (most recent call last):
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 275, in save
obj, filename = self._write_array(obj, filename)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 236, in _write_array
self.np.save(filename, array)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/numpy/lib/npyio.py", line 491, in save
pickle_kwargs=pickle_kwargs)
File "/home/ubuntu/anaconda/lib/python2.7/site-packages/numpy/lib/format.py", line 585, in write_array
@pjbull
pjbull / jupyter_notebook_config.py
Last active November 13, 2023 20:17
Create .py and .html on save of Jupyter notebook
import os
import re
from nbconvert.nbconvertapp import NbConvertApp
from nbconvert.postprocessors.base import PostProcessorBase
class CopyToSubfolderPostProcessor(PostProcessorBase):
def __init__(self, subfolder=None):
self.subfolder = subfolder
@pjbull
pjbull / SparseInteractions.py
Last active April 18, 2021 12:09
Sparse Interaction Terms for scikit-learn
from sklearn.base import BaseEstimator, TransformerMixin
from scipy import sparse
from itertools import combinations
class SparseInteractions(BaseEstimator, TransformerMixin):
def __init__(self, degree=2, feature_name_separator="_"):
self.degree = degree
self.feature_name_separator = feature_name_separator
@pjbull
pjbull / tree_as_md_table.py
Last active March 12, 2017 18:23
Prints the output of the tree command as a markdown table for documentation
import subprocess
path = ROOT_DIR
result = (subprocess.check_output(['tree', '--dirsfirst', path])
.decode("utf-8", "strict"))
file_list = result.split('\n')
root = file_list[0]
file_list = file_list[1:-3]