Sam Shleifer sshleifer

## pandorable_notes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / pandorable_notes.md
            
            
              Last active
              May 14, 2016 22:49
            
              
                Notes on http://tomaugspurger.github.io/ Modern Pandas blogposts
              
          
    Will immediately Incorporate


df.assign(lambda x: x.px * 2) # x is the DataFrame magically this will save us mucho code
df.loc[df.index.get_level_values(1) == 'donger'] can be df.loc[pd.IndexSlice[:,'donger'],]
ser.sort_values(ascending=False).head() can be ser.nlargest(5). nsmallest also exists.
df.add_suffix is built into pandas
df.dropna(thresh=4) If at least thresh items are missing, the row is dropped.

Could be useful


pd.TimeGrouper('H')


## zimmerman_chap2.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / zimmerman_chap2.md
            
            
              Last active
              July 4, 2016 01:11
            
              
                Notes on Chapter 2 of Tom Zimmerman's Dissertation
              
          
    [Paper] (https://dash.harvard.edu/bitstream/handle/1/17467320/ZIMMERMANN-DISSERTATION-2015.pdf?sequence=1])
Intro: Econom(etr)ics vs. ML


Economics focused on empirical relationships between features and outcomes, ML focused on predicting outcomes.
Beta vs. yhat. cv.coeffs  vs cv.metrics.fscore
TZ: Can test relationship by seeing if inclusion of variable in big model improves predictions,
thereby avoiding omitted control issues.
requires ML approach (feature engineering) on investor behavior datasets!
implementation details and robustness checks more valuable than actual results on disposition effect.


## apps.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              7 stars
            
          
                sshleifer
                / apps.md
            
            
              Last active
              September 1, 2023 15:12
            
              
                My Favorite apps and workflow stuff (for mac/iOS/python)
              
          
    Mac


Spectacle


Rescuetime


Self Control


iTerm2


Fluid made standalone Gmail, Trello apps for cmd-tab


iStat


Alfred


## kernel_trick.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / kernel_trick.md
            
            
              Last active
              October 19, 2016 19:19
            
              
                Attempt at explaining the kernel trick in preparation for 6.867 Midterm
              
          
    Problem: Transforming X into φ(X) space can be expensive, and it is usually used as an intermediate result inside of a dot product like  <φ(x[i]), φ(x[j])>.
Trick to save computation time:  Conditional on having a φ where we know how to compute <φ(x[i]), φ(x[j])> through a shortcut, we can use the shortcut instead of explicitly calling φ and storing the long intermediate result. The savings  stem from (a) saving calls to φ,  and (b) making the dot product operate on shorter vectors.
Example

φ(x) = (x[1]**2, sqrt(2)*x[1]* x[2], x[2]**2)

&lt;φ(x),φ(z)&gt; = sum((x[1]**2)(z[1]**2), 2x[1]x[2]z[1]z[2], (x[2]**2)(z[2]**2))


## imagerive.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / imagerive.md
            
            
              Created
              June 7, 2018 17:16
            
              
                Imagerive Notes
              
          
    WHERE IS THE DATA?
SSH into {FIXME} while connected to ImageRive VPN (must be from windows machine)
All data is is /merantix_core/data/hospitals/imagerive/export
Anonymized reports in reports
anonymized_dicoms/
export/cases_new.json
export/patients_new.json
Normal Windows VPN connection.

  
## generate_boxes_from_masks.py
import numpy as np
import pandas as pd
import pickle as pkl
import nrrd
import glob
import os
import sys

def find_bounding_box(mask, point, label):
    visited = set()

## sitk_attempt.py
import SimpleITK as sitk
import numpy as np
mask_file = '/data/ct-cspine/test_set_w_masks_2019_05_01/cspine_fx_seg/Cspine_fx_seg/5616571.nrrd'
array_file = '/data/ct-cspine/processed-studies/data_20180524_161757/anonymized_data/images/test/5616571.npy'

def projectImage(reference, moving, interpolate = 'linear'):
    # projects moving image onto reference image space
    # use  interpolate = 'NN' for segmentation masks
    resample = sitk.ResampleImageFilter()
    resample.SetReferenceImage(reference)

## mix_match.py
"""Modified from https://github.com/gan3sh500/mixmatch-pytorch/blob/master/layer.py
Implementation of """


def mixmatch(X_labeled, y, X_unlabeled, model, augment_fn, T=0.5, K=2, alpha=0.75):
    """Generate labeled and unlabeled batches for mixmatch. Helpers are below. Use in dataloader."""
    xb = augment_fn(X_labeled)
    n_labeled = len(xb)
    ub = [augment_fn(X_unlabeled) for _ in range(K)]  # unlabeled
    qb = sharpen(sum(map(model, ub)) / K, T)

## hardness_grid.py
pg1 = update_batch_size(ParameterGrid({
    'lr': [1e-4, 1e-3, 3e-3, 1e-2, .05, 1e-1],
    'label_smoothing': [True, False],
    'size': [128],
    'bs': [256],
   'hardness_percentile': [.75, .5, .25, .1]  # top 50%, top25%

}))


## gcp_setup_help.sh
#!/usr/bin/env bash
#Make an instance here
# https://console.cloud.google.com/marketplace/details/click-to-deploy-images/deeplearning?_ga=2.50258406.1502354465.1584473811-759161763.1583556304
# dont enable jupyterlab

# Note that if you work at curai, this is moved to https://github.com/curai/experiments/blob/master/shleifer/gcp_setup.md


# Follow these instructions until the start of the  "First-time setup script" section
# https://github.com/cs231n/gcloud/
	import numpy as np
	import pandas as pd
	import pickle as pkl
	import nrrd
	import glob
	import os
	import sys

	def find_bounding_box(mask, point, label):
	visited = set()
	import SimpleITK as sitk
	import numpy as np
	mask_file = '/data/ct-cspine/test_set_w_masks_2019_05_01/cspine_fx_seg/Cspine_fx_seg/5616571.nrrd'
	array_file = '/data/ct-cspine/processed-studies/data_20180524_161757/anonymized_data/images/test/5616571.npy'

	def projectImage(reference, moving, interpolate = 'linear'):
	# projects moving image onto reference image space
	# use interpolate = 'NN' for segmentation masks
	resample = sitk.ResampleImageFilter()
	resample.SetReferenceImage(reference)
	"""Modified from https://github.com/gan3sh500/mixmatch-pytorch/blob/master/layer.py
	Implementation of """


	def mixmatch(X_labeled, y, X_unlabeled, model, augment_fn, T=0.5, K=2, alpha=0.75):
	"""Generate labeled and unlabeled batches for mixmatch. Helpers are below. Use in dataloader."""
	xb = augment_fn(X_labeled)
	n_labeled = len(xb)
	ub = [augment_fn(X_unlabeled) for _ in range(K)] # unlabeled
	qb = sharpen(sum(map(model, ub)) / K, T)
	pg1 = update_batch_size(ParameterGrid({
	'lr': [1e-4, 1e-3, 3e-3, 1e-2, .05, 1e-1],
	'label_smoothing': [True, False],
	'size': [128],
	'bs': [256],
	'hardness_percentile': [.75, .5, .25, .1] # top 50%, top25%

	}))
	#!/usr/bin/env bash
	#Make an instance here
	# https://console.cloud.google.com/marketplace/details/click-to-deploy-images/deeplearning?_ga=2.50258406.1502354465.1584473811-759161763.1583556304
	# dont enable jupyterlab

	# Note that if you work at curai, this is moved to https://github.com/curai/experiments/blob/master/shleifer/gcp_setup.md


	# Follow these instructions until the start of the "First-time setup script" section
	# https://github.com/cs231n/gcloud/