Notes on fast.ai machine learning course by Jeremy Horward
- video: https://youtu.be/CzdWqFTmn0Y
- github: https://github.com/fastai/fastai/tree/master/courses/ml1
pd.read_csv(low_memory=False, parse_dates)
https://youtu.be/CzdWqFTmn0Y?t=1717- display_all function for df: https://youtu.be/CzdWqFTmn0Y?t=1929
- log error cares about ratio rather than value: https://youtu.be/CzdWqFTmn0Y?t=2107
- add_datepart: https://youtu.be/CzdWqFTmn0Y?t=3072
- train_cat/apply_cat: https://youtu.be/CzdWqFTmn0Y?t=3467
.dt
and.cat
and.cat.set_categories(ordered=True)
: https://youtu.be/CzdWqFTmn0Y?t=3595- make a tmp dir: https://youtu.be/CzdWqFTmn0Y?t=3988
to_feather
: https://youtu.be/CzdWqFTmn0Y?t=3931proc_df
: https://youtu.be/CzdWqFTmn0Y?t=4087fix_missing
: https://youtu.be/CzdWqFTmn0Y?t=4167df.items()
: https://youtu.be/CzdWqFTmn0Y?t=4246- including IDs works fine: https://youtu.be/CzdWqFTmn0Y?t=4330
- RandomForrest
n_jobs=-1
: https://youtu.be/CzdWqFTmn0Y?t=4345 print_score
: https://youtu.be/CzdWqFTmn0Y?t=4483
- link source code
ln -s
: https://youtu.be/blyXCk4sgEg?t=149 - pandas categories: https://youtu.be/blyXCk4sgEg?t=529
- r-squared explained: https://youtu.be/blyXCk4sgEg?t=758
proc_df(subset=)
: https://youtu.be/blyXCk4sgEg?t=1844- RandomForrest
boostrap=False
anddraw_tree
: https://youtu.be/blyXCk4sgEg?t=1935 - Bagging of little bootstraps: https://youtu.be/blyXCk4sgEg?t=2858
- Less predictive and less correlated trees: https://youtu.be/blyXCk4sgEg?t=3442
- each tree can predict in RF: https://youtu.be/blyXCk4sgEg?t=3879
- 20-30 trees, 1000 trees over night: https://youtu.be/blyXCk4sgEg?t=4143
- RF out of bag validation
oob_score
: https://youtu.be/blyXCk4sgEg?t=4225 set_rf_samples
: https://youtu.be/blyXCk4sgEg?t=4600min_samples_leaf
: https://youtu.be/blyXCk4sgEg?t=4953max_features
: https://youtu.be/blyXCk4sgEg?t=5054- RF never remove variables after each split: https://youtu.be/blyXCk4sgEg?t=5238
- Order of categorical variables (not one-hot) doesn't matter: https://youtu.be/blyXCk4sgEg?t=5577
- set
dtype
inread_csv
for large files : https://youtu.be/YSFG_W8JxBo?t=964 - SIMD: https://youtu.be/YSFG_W8JxBo?t=1192
shuf
read in a random sample of large csv: https://youtu.be/YSFG_W8JxBo?t=1243describe(include='all')
: https://youtu.be/YSFG_W8JxBo?t=1371- how long RF runs depends on what: https://youtu.be/YSFG_W8JxBo?t=1784
prun
, profiler: https://youtu.be/YSFG_W8JxBo?t=1862- validation score v.s. kaggle score correlation - how to check val set is good: https://youtu.be/YSFG_W8JxBo?t=2694
- confidence of prediction - standard deviation of the predictions of trees: https://youtu.be/YSFG_W8JxBo?t=3402
parallel_tree
: https://youtu.be/YSFG_W8JxBo?t=3633- pandas plotting:
- pandas
groupby(as_index=False)
: https://youtu.be/YSFG_W8JxBo?t=3855 - feature importance: https://youtu.be/YSFG_W8JxBo?t=4051
- trust this feature importance better: https://youtu.be/YSFG_W8JxBo?t=4630
- how to get feature importance: https://youtu.be/YSFG_W8JxBo?t=4679
- RF hyperparameters:
- https://youtu.be/0v93qHDqq_g?t=144
- Values for parameters to try: https://youtu.be/0v93qHDqq_g?t=1096
- Use oob to check overfitting: https://youtu.be/0v93qHDqq_g?t=2029
- issues with traditional feature importance: https://youtu.be/0v93qHDqq_g?t=2249
- one-hot
- https://youtu.be/0v93qHDqq_g?t=2467
- for RF, not mendatory
- only one-hot low-cardinality columns: https://youtu.be/0v93qHDqq_g?t=2812
- use dendrogram to further remove redundent variables: https://youtu.be/0v93qHDqq_g?t=3328
- rank correlation/spearmanr: https://youtu.be/0v93qHDqq_g?t=3600
- partial dependence: https://youtu.be/0v93qHDqq_g?t=4061
- ggplot: https://youtu.be/0v93qHDqq_g?t=4201
get_sample
: https://youtu.be/0v93qHDqq_g?t=4302- partial dependence plot pdp: https://youtu.be/0v93qHDqq_g?t=4512
- pdp interaction: https://youtu.be/0v93qHDqq_g?t=5270
- tree interpreter: https://youtu.be/0v93qHDqq_g?t=5560
- weigh rows according to how recent they are: https://youtu.be/3jl2h9hSRvc?t=1263
- what JH does with temporal data and val and test: https://youtu.be/3jl2h9hSRvc?t=1284
- how to know if your validation set is good:
- cross validatoin and why not: https://youtu.be/3jl2h9hSRvc?t=1867
- waterfall plot: https://youtu.be/3jl2h9hSRvc?t=2506
- extrapolation!:
- https://youtu.be/3jl2h9hSRvc?t=2972
- predict if a sample is in validation set: https://youtu.be/3jl2h9hSRvc?t=3217
- about reset random sample v.s. r square: https://youtu.be/3jl2h9hSRvc?t=3690
- https://youtu.be/3jl2h9hSRvc?t=3760
np.random.permutation
: https://youtu.be/3jl2h9hSRvc?t=4610- idx in tree: https://youtu.be/3jl2h9hSRvc?t=4919
ML, business, application, and optimization: https://youtu.be/BFIYUvBRTpE?t=821
- https://youtu.be/BFIYUvBRTpE?t=2240
- feature importance again: https://youtu.be/BFIYUvBRTpE?t=2770
- partial dependence again:
- https://youtu.be/BFIYUvBRTpE?t=3064
- a simplified version (using avg of all other features) for further explanation: https://youtu.be/BFIYUvBRTpE?t=3362
- hub: https://youtu.be/BFIYUvBRTpE?t=4227
- nan is -1 in pandas, fastai add 1: https://youtu.be/BFIYUvBRTpE?t=4447
- interaction importance discussion: https://youtu.be/BFIYUvBRTpE?t=4649
- extrapolating live: https://youtu.be/BFIYUvBRTpE?t=5088
x[...:None]
: add a last dimension
- how big val set needs to be: https://youtu.be/O5F9vR2CNYI?t=364
- binomial: https://youtu.be/O5F9vR2CNYI?t=798
- oversample less common class always better: https://youtu.be/O5F9vR2CNYI?t=1076
- https://youtu.be/O5F9vR2CNYI?t=1127
__repr__
: https://youtu.be/O5F9vR2CNYI?t=1682@property
: https://youtu.be/O5F9vR2CNYI?t=1770- improve computational performance (O(n^2) to O(n): https://youtu.be/O5F9vR2CNYI?t=2746
%prun
: https://youtu.be/O5F9vR2CNYI?t=3036- developing oop method with
func(self, ...)
e.g.,find_better_split(self, var_idx)
and pass an object as self: e.g.,find_better_split(tree, 1)
in the nb https://github.com/fastai/fastai/blob/master/courses/ml1/lesson3-rf_foundations.ipynb plot.scatter(..., s=6)
: https://youtu.be/O5F9vR2CNYI?t=4192%load_ext Cython
: https://youtu.be/O5F9vR2CNYI?t=4448- insert image in forum: https://youtu.be/O5F9vR2CNYI?t=4900
- jupyter gist-it extension: https://youtu.be/O5F9vR2CNYI?t=4912
- pickle vs feather,
gzip.open
: https://youtu.be/DzE0eSdy5Hk?t=539 - normalize about random forest: https://youtu.be/DzE0eSdy5Hk?t=1079
- how to do scaling: https://youtu.be/DzE0eSdy5Hk?t=1731
- tensor vs jagged list: https://youtu.be/DzE0eSdy5Hk?t=2238
- https://youtu.be/DzE0eSdy5Hk?t=3502
.cuda()
ImagaClassifierData.from_array
: https://youtu.be/DzE0eSdy5Hk?t=3664- binary cross entropy: https://youtu.be/DzE0eSdy5Hk?t=3913
- pytorch module from scratch: https://youtu.be/DzE0eSdy5Hk?t=4730
nn.Parameter
: https://youtu.be/DzE0eSdy5Hk?t=5203- initiate weight, divide by dim[0], kaiming he init: https://youtu.be/DzE0eSdy5Hk?t=5085
forward
: https://youtu.be/DzE0eSdy5Hk?t=5258
- using lesson4minutsgd nb.
fit
: https://youtu.be/PGC0UxakTvM?t=833- make a tensor into variable and put on gpu: https://youtu.be/PGC0UxakTvM?t=1995
- predict on a single tensor: https://youtu.be/PGC0UxakTvM?t=2119
- pytorch max include argmax: https://youtu.be/PGC0UxakTvM?t=2757
- broadcasting: https://youtu.be/PGC0UxakTvM?t=2830
np.broadcast_to
: https://youtu.be/PGC0UxakTvM?t=3657- rules of broadcasting: https://youtu.be/PGC0UxakTvM?t=3766
- notiation as tool of thought example outer product, outer greater than, outer sum, grid: https://youtu.be/PGC0UxakTvM?t=4067
- tensor decomposition, tensor regression, tensorly: https://youtu.be/PGC0UxakTvM?t=4441
T
istorch.from_numpy
: https://youtu.be/PGC0UxakTvM?t=4524- matrixproduct.xyz
- structured data/semisupervised kaggle safe driver comp: https://youtu.be/37sFIak42Sc?t=321
- pytorch variable tutorial: https://youtu.be/37sFIak42Sc?t=721
- why need to set grad back to zero: https://youtu.be/37sFIak42Sc?t=1488
fit()
: https://youtu.be/37sFIak42Sc?t=1863numel
andnet.parameters()
: https://youtu.be/37sFIak42Sc?t=2459- weight decay: https://youtu.be/37sFIak42Sc?t=2812
- weight decay should penalize training (bigger loss), but it could seem the opposite at early epochs: https://youtu.be/37sFIak42Sc?t=3109
- secret of over-parameterization then regularization: https://youtu.be/37sFIak42Sc?t=3496
- ideas and opportunities for interpreting nn: https://youtu.be/37sFIak42Sc?t=3659
- https://youtu.be/37sFIak42Sc?t=3746
text_from_folders
: https://youtu.be/37sFIak42Sc?t=3892.sign()
: https://youtu.be/37sFIak42Sc?t=5376dual=True
in logistic regr: https://youtu.be/37sFIak42Sc?t=5506C
in logistic regr: https://youtu.be/37sFIak42Sc?t=5609ngram_range
: https://youtu.be/37sFIak42Sc?t=5776
- math form similarity btw naive bayes and logistic regression: https://youtu.be/XJ_waZlJU8g?t=789
- use log ratios as feature; i.e., use prior as feature (paper: Baselines and Bigrams: Simple, Good Sentiment and Topic Classification): https://youtu.be/XJ_waZlJU8g?t=1532
- nbsvm/nb based logistic regr: https://youtu.be/XJ_waZlJU8g?t=2639
- equivalency btw embedding look-up and matrix multiplication - no need to use the matrix: https://youtu.be/XJ_waZlJU8g?t=4171
- kaggle grocery: https://youtu.be/XJ_waZlJU8g?t=4766
- pandas
.str
functions (vectorized operations): https://youtu.be/XJ_waZlJU8g?t=5263 - when possible, treat variable as categorical then embedding: https://youtu.be/XJ_waZlJU8g?t=5612
- slowness of
iterrow
, usezip
to get array: https://youtu.be/5_xFdhfUnvQ?t=608 rolling
and pandas time series api: https://youtu.be/5_xFdhfUnvQ?t=984- at most 600 embeddings; half cardinality and no more than 50: https://youtu.be/5_xFdhfUnvQ?t=2039
- public leaderboard issue: https://youtu.be/5_xFdhfUnvQ?t=2462
- feature importance for nn: https://youtu.be/5_xFdhfUnvQ?t=3366
- bootstrapping for significance and confidence interval: https://youtu.be/5_xFdhfUnvQ?t=3651