Skip to content

Instantly share code, notes, and snippets.

View ecgill's full-sized avatar

Emily Gill ecgill

  • University of Colorado
  • Boulder, CO
View GitHub Profile
@ecgill
ecgill / Pipeline-guide.md
Created February 16, 2018 20:18 — forked from amberjrivera/Pipeline-guide.md
Quick tutorial on Sklearn's Pipeline constructor for machine learning

If You've Never Used Sklearn's Pipeline Constructor...You're Doing It Wrong

How To Use sklearn Pipelines, FeatureUnions, and GridSearchCV With Your Own Transformers

By Emily Gill and Amber Rivera

What's a Pipeline and Why Use One?

The Pipeline constructor from sklearn allows you to chain transformers and estimators together into a sequence that functions as one cohesive unit. For example, if your model involves feature selection, standardization, and then regression, those three steps, each as it's own class, could be encapsulated together via Pipeline.

Benefits: readability, reusability and easier experimentation.
@ecgill
ecgill / tmux-cheatsheet.markdown
Created February 16, 2018 21:29 — forked from MohamedAlaa/tmux-cheatsheet.markdown
tmux shortcuts & cheatsheet

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname
@ecgill
ecgill / flask_ec2_tmux.md
Last active February 16, 2018 21:32
Updating a Flask App (via GitHub repo) on EC2 instance

Running a Flask App on EC2 using tmux

Quick guide to running a Flask app by cloning your GitHub repository onto an EC2 instance and how to update the repository to make changes to your app

By Emily Gill

Note: These instructions start after you have already created a Flask app inside of a GitHub repository. Also assumed is that you know how to launch an EC2 instance.

Part 1: Create web app and EC2 instance.

  1. Make sure that your web app is running locally. Assuming your Flask app script is called app.py, run the following from the command line to check:
@ecgill
ecgill / useful_pandas_snippets.py
Created February 17, 2018 17:47 — forked from bsweger/useful_pandas_snippets.md
Useful Pandas Snippets
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()
# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])
# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!
@ecgill
ecgill / useful_spark_snippets.py
Created February 17, 2018 19:18
Useful Spark snippets
# Starting spark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Name').getOrCreate()
# read in data from various formats:
df = spark.read.csv('filename')
df = spark.read.json('filename', inferSchema=True, header=False)
# change schema and read in data:
from pyspark.sql.types import StructField, StringType, IntegerType, StructType