Skip to content

Instantly share code, notes, and snippets.

@rikturr
rikturr / parallel-post-fit-scheduler-issues.ipynb
Created July 1, 2021 19:36
parallel-post-fit-scheduler-issues
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rikturr
rikturr / sklearn-n-jobs-estimators.py
Created February 9, 2021 19:47
sklearn-n-jobs-estimators
from sklearn.utils import all_estimators
import inspect
has_n_jobs = []
for est in all_estimators():
s = inspect.signature(est[1])
if 'n_jobs' in s.parameters:
has_n_jobs.append(est)
print(has_n_jobs)
@rikturr
rikturr / dask-joblib-issues.ipynb
Created February 5, 2021 21:43
dask-scikit-learn-joblib-issues
Sorry, this is too big to display.
@rikturr
rikturr / aaron-job-search.ipynb
Last active May 20, 2021 12:43
aaron-job-search-analysis
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rikturr
rikturr / aaron-job-search.csv
Last active December 7, 2020 21:50
aaron-job-search
We can make this file beautiful and searchable if this error is corrected: It looks like row 10 should actually have 15 columns, instead of 1. in line 9.
organization,org_type,position,source,referral,end_stage,num_interviews,date_applied,date_interview1,date_interview2,date_interview3,date_rejected,date_declined,date_accepted,notes
Twitter,Public,Staff machine learning engineer,Referral,1,Reject-resume,0,4/8/2020,,,,4/29/2020,,,
Wikimedia foundation,Non-profit,Machine learning engineer,Search,0,Reject-ghosted,0,4/19/2020,,,,,,,
UNOPS,Government,Predictive Analytics Technical Specialist,Search,0,Reject-ghosted,0,4/19/2020,,,,,,,
Noom,Startup,Senior data scientist,Search,0,Reject-resume,0,4/20/2020,,,,4/23/2020,,,
Memorial Sloan Kettering Cancer Center,Private-L,Lead data scientist,Search,0,Reject-ghosted,0,4/20/2020,,,,,,,
Memorial Sloan Kettering Cancer Center,Private-L,Senior data engineer,Search,0,Reject-resume,0,4/20/2020,,,,5/13/2020,,,
Memorial Sloan Kettering Cancer Center,Private-L,Data engineer,Search,0,Decline-resume,0,4/20/2020,,,,,6/17/2020,,Took other job
Prominent Edge,Consulting,Lead data scientist,Search,0,Reject-ghosted,0,4/21/2020,,,,,,,
Open
@rikturr
rikturr / dask-rapids.py
Created October 13, 2020 19:55
dask-rapids
# notice "dask" in these imports
import dask_cudf
from cuml.dask.ensemble import RandomForestClassifier
taxi = dask_cudf.read_csv(
's3://nyc-tlc/trip data/yellow_tripdata_2019-01.csv',
parse_dates=['tpep_pickup_datetime', 'tpep_dropoff_datetime'],
storage_options={'anon': True},
assume_missing=True,
)
@rikturr
rikturr / saturn-gpu-cluster.py
Created October 13, 2020 19:53
saturn-gpu-cluster
from dask.distributed import Client
from dask_saturn import SaturnCluster
cluster = SaturnCluster(
n_workers=3,
scheduler_size='medium',
worker_size='g4dnxlarge'
)
client = Client(cluster)
@rikturr
rikturr / rapids-random-forest.py
Created October 13, 2020 19:49
rapids-random-forest
from cuml.ensemble import RandomForestClassifier
# see notebook for prep_df function
taxi_train = prep_df(taxi)
rfc = RandomForestClassifier(n_estimators=100, max_depth=10, seed=42)
rfc.fit(taxi_train[features], taxi_train[y_col])
@rikturr
rikturr / rapids-load-data.py
Last active October 13, 2020 19:46
rapids-load-data
import cudf
import s3fs
s3 = s3fs.S3FileSystem(anon=True)
taxi = cudf.read_csv(
s3.open('s3://nyc-tlc/trip data/yellow_tripdata_2019-01.csv', mode='rb'),
parse_dates=['tpep_pickup_datetime', 'tpep_dropoff_datetime']
)
@rikturr
rikturr / rf-rapids-issues.ipynb
Last active August 6, 2020 15:00
cuml rf predict inf
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.