orthogonalization: know what to tune to achieve what effect; for this would help to have orthogonal controls (steering wheel, acceleration, braking; well defined impact); however that's not usually the case in machine learning
assumptions we always made in ML:
- fit training set well on cost function (human like): knobs would be: bigger network, better optimization algorithm (adam)
- hope it does well in dev set: knobs would be: bigger (training) data set, regularization
- hope it does well in test set: knob would be: bigger dev set
- performs well in real world: k: change dev set or cost function
best optimal error: best possible error
2 examples, how for same error % (training / dev) depending on humans (bayes) % error we take diff paths:
avoidable bias on 1st is 7% while in 2nd is 0.5%
typically we expect error percentage to increase as we progress on the analysis: from bayes / human to training then dev and finally test; what if percentages do not always increase? eg training vs dev: perhaps dev set distribution is less harder; to confirm we have to run dev as with training so that we have a fair comparison