Skip to content

Instantly share code, notes, and snippets.

@veshboo
Created August 22, 2017 04:17
Show Gist options
  • Save veshboo/6fb49c153588cb7b9a5dd7dfec2cd535 to your computer and use it in GitHub Desktop.
Save veshboo/6fb49c153588cb7b9a5dd7dfec2cd535 to your computer and use it in GitHub Desktop.
pandoc markdown with latex
  • Not-satisfiying result! -> What to Try Next?
    • More training examples
    • Trying smaller sets of features
    • Trying additional sets of features
    • Trying polynomial features
    • Increasing or decreasing $\lambda$

Training set / Cross Validation set / Test set

  • Evaluating a hypothesis with a separate test set
    • Check overfit, generalization
    • train:test = 70%:30%
  • Model selection with another separate cross validation set
    • Compare different models (# of features, degree of polynomial, and $\lambda$)
    • train:cv:test = 60%:20%:20%

Model selection details: $J(\Theta;\lambda)$

  • Number of parameters($|\Theta|$) and Bias/Variance
    • bias(underfit): $J_{CV}(\theta) ≈ J_{train}(θ) ≫ 0$, not enough parameters for task
    • variance(overfit): $J_{CV}(θ) ≫ J_{train}(θ)$, too many parameters for task
  • Regualization and Bias/Variance
    • Remind contribution of $\lambda$ to $J$: $J(θ) = \frac{1}{2m} \sum_{i=1}^{m} (h_θ (x^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=1}^{m} \theta_j^2$
    • $\lambda \approx 0 \implies \verb!maybe overfit!$
    • $\lambda \gg 0 \implies \verb!maybe less overfit!$
  • Learning Curves, Error x training set size
    • High bias, low training size: $J_{train}(\Theta)$ low and $J_{CV}(\Theta)$ high
    • High bias, large training size: both $J_{train}(\Theta)$ and $J_{CV}(\Theta)$ high
    • High variance, low training size: $J_{train}(\Theta)$ low and $J_{CV}(\Theta)$ high
    • High variance, large training size: $J_{train}(\Theta)$ OK and $J_{CV}(\Theta)$ keep decreasing
  • Summary
    • Select best combo $\Theta$, $\lambda$, and right amount of data by checking $J_{CV}$
    • And also check $J_{test}$ for good generalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment