Emily Gill ecgill

## Pipeline-guide.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                ecgill
                / Pipeline-guide.md
            
            
              Created
              February 16, 2018 20:18
                — forked from amberjrivera/Pipeline-guide.md
            
              
                Quick tutorial on Sklearn's Pipeline constructor for machine learning
              
          
    If You've Never Used Sklearn's Pipeline Constructor...You're Doing It Wrong

How To Use sklearn Pipelines, FeatureUnions, and GridSearchCV  With Your Own Transformers

By Emily Gill and Amber Rivera
What's a Pipeline and Why Use One?

The Pipeline constructor from sklearn allows you to chain transformers and estimators together into a sequence that functions as one cohesive unit. For example, if your model involves feature selection, standardization, and then regression, those three steps, each as it's own class, could be encapsulated together via Pipeline.
Benefits:  readability, reusability and easier experimentation.


## tmux-cheatsheet.markdown

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ecgill
                / tmux-cheatsheet.markdown
            
            
              Created
              February 16, 2018 21:29
                — forked from MohamedAlaa/tmux-cheatsheet.markdown
            
              
                tmux shortcuts & cheatsheet
              
          
    tmux shortcuts & cheatsheet

start new:
tmux

start new with session name:
tmux new -s myname


## flask_ec2_tmux.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                ecgill
                / flask_ec2_tmux.md
            
            
              Last active
              February 16, 2018 21:32
            
              
                Updating a Flask App (via GitHub repo) on EC2 instance
              
          
    Running a Flask App on EC2 using tmux

Quick guide to running a Flask app by cloning your GitHub repository onto an EC2 instance and how to update the repository to make changes to your app

By Emily Gill
Note: These instructions start after you have already created a Flask app inside of a GitHub repository. Also assumed is that you know how to launch an EC2 instance.
Part 1: Create web app and EC2 instance.


Make sure that your web app is running locally. Assuming your Flask app script is called app.py, run the following from the command line to check:


## useful_pandas_snippets.py
# List unique values in a DataFrame column
# h/t @makmanalp for the updated syntax!
df['Column Name'].unique()

# Convert Series datatype to numeric (will error if column has non-numeric values)
# h/t @makmanalp
pd.to_numeric(df['Column Name'])

# Convert Series datatype to numeric, changing non-numeric values to NaN
# h/t @makmanalp for the updated syntax!

## useful_spark_snippets.py
# Starting spark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Name').getOrCreate()

# read in data from various formats:
df = spark.read.csv('filename')
df = spark.read.json('filename', inferSchema=True, header=False)

# change schema and read in data:
from pyspark.sql.types import StructField, StringType, IntegerType, StructType
	# List unique values in a DataFrame column
	# h/t @makmanalp for the updated syntax!
	df['Column Name'].unique()

	# Convert Series datatype to numeric (will error if column has non-numeric values)
	# h/t @makmanalp
	pd.to_numeric(df['Column Name'])

	# Convert Series datatype to numeric, changing non-numeric values to NaN
	# h/t @makmanalp for the updated syntax!
	# Starting spark session
	from pyspark.sql import SparkSession
	spark = SparkSession.builder.appName('Name').getOrCreate()

	# read in data from various formats:
	df = spark.read.csv('filename')
	df = spark.read.json('filename', inferSchema=True, header=False)

	# change schema and read in data:
	from pyspark.sql.types import StructField, StringType, IntegerType, StructType