Skip to content

Instantly share code, notes, and snippets.

View michaelgao8's full-sized avatar
🤔

Michael Gao michaelgao8

🤔
  • Duke University
  • Durham, NC
View GitHub Profile
logging.basicConfig(
format='%(asctime)s %(levelname)-8s %(message)s',
level=logging.INFO,
datefmt='%Y-%m-%d %H:%M:%S')
logging.getLogger().setLevel(logging.INFO)
@michaelgao8
michaelgao8 / custom_scaler.py
Last active October 19, 2019 21:48
Custom Scaling
class CustomScaler(BaseEstimator,TransformerMixin):
"""Inspired by https://stackoverflow.com/a/41461843/6248179
"""
def __init__(self,columns,copy=True,with_mean=True,with_std=True):
self.scaler = StandardScaler(copy,with_mean,with_std)
self.columns = columns
def fit(self, X, y=None):
self.scaler.fit(X.loc[:, self.columns].values, y)
return self
# SOURCE: https://news.ycombinator.com/item?id=21260001
replace nvl with coalesce
replace rownum <= 1 with LIMIT 1
replace listagg with string_agg
replace recursive hierarchy (start with/connect by/prior) with recursive
@michaelgao8
michaelgao8 / multi_index_aggregation.py
Created September 17, 2019 01:27
Comparison of aggregation using multi-index vs not
def featurize_num_prior_encounters_multi_index(id_col, time_col, period_in_days, df):
start_col = 'start_col'
df[start_col] = df[time_col] - pd.Timedelta(days = period_in_days)
# set multi_index
df = df.set_index([id_col, time_col])
num_adm = []
id_list = []
for i, (idx, data) in enumerate(df.groupby(level = id_col)):
num_adm.append(data.loc[data.index.get_level_values(1) > data['start_col']].shape[0])
id_list.append(idx)

Keybase proof

I hereby claim:

  • I am michaelgao8 on github.
  • I am michaelgao8 (https://keybase.io/michaelgao8) on keybase.
  • I have a public key ASD0gvLDXyHs3rX3JLhkC09CLKU7q2HrdvMna8mHPC8qMwo

To claim this, I am signing this object:

@michaelgao8
michaelgao8 / inspect_df.py
Created June 24, 2019 19:55
inspect a Pandas DataFrame helper
def inspect_df(DataFrame):
"""
Drop-in code for easier grading
input: pd.DataFrame of interest
"""
print("Head: ")
print(DataFrame.head())
print(" ======================== ")
print("Shape: ")
@michaelgao8
michaelgao8 / start_notebook.sh
Created June 4, 2019 20:13
Start a jupyter notebook docker container in the background and print out the link with the associated token
hash=$(docker run -d -p 8888:8888 -v /Users/michael/Projects:/home/jovyan/work jupyter/datascience-notebook jupyter notebook) && sleep 5 && docker exec "$hash" jupyter notebook list
def cross_validate_xgboost(train_data, train_output,
n_folds, param_grid,
type_dict,
fixed_param_dict = {'objective': 'binary:logistic', 'eval_metric': ['auc']},
metric_func_dict = {'auc': sklearn.metrics.roc_auc_score},
other_metrics_dict = None, keep_data = True, **kwargs):
"""
Perform k-fold cross-validation with xgboost hyperparameters
Get the average performance across folds and save all of the results
@michaelgao8
michaelgao8 / clear_notebook.sh
Created May 19, 2019 23:06
Clear output from notebook CLI
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace Notebook.ipynb
@michaelgao8
michaelgao8 / expanded_grid.py
Created May 9, 2019 02:08
Create an expanded grid from lists in numpy
import numpy as np
a = [1,2,3]
b = [3,4,5]
c = [6,7]
d = [8,9,0]
# Desired:
# All possible combinations of these 4 values.