Skip to content

Instantly share code, notes, and snippets.

@frank-leap
Created December 21, 2017 20:08
Show Gist options
  • Save frank-leap/c3ea8194a6066c7d15bc3f55bcdaed26 to your computer and use it in GitHub Desktop.
Save frank-leap/c3ea8194a6066c7d15bc3f55bcdaed26 to your computer and use it in GitHub Desktop.
coursera_deep_learning_3.md

orthogonalization: know what to tune to achieve what effect; for this would help to have orthogonal controls (steering wheel, acceleration, braking; well defined impact); however that's not usually the case in machine learning

assumptions we always made in ML:

  • fit training set well on cost function (human like): knobs would be: bigger network, better optimization algorithm (adam)
  • hope it does well in dev set: knobs would be: bigger (training) data set, regularization
  • hope it does well in test set: knob would be: bigger dev set
  • performs well in real world: k: change dev set or cost function
@frank-leap
Copy link
Author

frank-leap commented Dec 22, 2017

2 different problems (orthogonalization):

  1. set the target (metric)
  2. solve how to success with that target (eg perhaps need to add weights)

@frank-leap
Copy link
Author

frank-leap commented Dec 22, 2017

best optimal error: best possible error

2 examples, how for same error % (training / dev) depending on humans (bayes) % error we take diff paths:

  • humans: 1% / training: 8% / dev: 10% >>> then we focus on bias error (humans vs training)
  • humans: 7.5% / training: 8% / dev: 10% >>> then we focus on variance (reduction) error (training vs dev)

avoidable bias on 1st is 7% while in 2nd is 0.5%

typically we expect error percentage to increase as we progress on the analysis: from bayes / human to training then dev and finally test; what if percentages do not always increase? eg training vs dev: perhaps dev set distribution is less harder; to confirm we have to run dev as with training so that we have a fair comparison

@frank-leap
Copy link
Author

frank-leap commented Dec 22, 2017

  • avoidable bias: train bigger model; train longer / better optimization algorithms (momentum, rmsprop, adam); neural network arch / hyperparameter search; CNN, RNN?
  • variance: more data, regularization (L2, dropout, data augmentation); NN arch / hyperparameter search; variance == not generalizing well

@frank-leap
Copy link
Author

4 potential types of error:

  1. avoidable bias (human vs training)
  2. variance (training vs training-dev)
  3. data mismatch (training-dev vs dev)
  4. degree over fitting to dev (dev vs test)

@frank-leap
Copy link
Author

Transfer learning: pre-training and fine tuning; a lot of low level features (detect edges, curves, positive objects...) can be reused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment