Skip to content

Instantly share code, notes, and snippets.

@shagunsodhani
Created March 13, 2016 17:51
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save shagunsodhani/1cd5d136c8ca30432de5 to your computer and use it in GitHub Desktop.
Save shagunsodhani/1cd5d136c8ca30432de5 to your computer and use it in GitHub Desktop.
Notes for "Regularization and variable selection via the elastic net" paper.

Regularization and variable selection via the elastic net

Introduction to elastic net

  • Regularization and variable selection method.
  • Sparse Representation
  • Exihibits grouping effect.
  • Prticulary useful when number of predictors (p) >> number of observations (n).
  • LARS-EN algorithm to compute elastic net regularization path.
  • Link to paper.

Lasso

  • Least square method with L1-penalty on regression coefficient.
  • Does continuous shrinkage and automatic variable selection

Limitations

  • If p >> n, lasso can select at most n variables.
  • In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected.
  • If n > p and there is a high correlation between predictors, ridge regression outperforms lasso.

Naive elastic net

  • Least square method.
  • Penalty on regression cofficients is a convex combination of lasso and ridge penalty.
  • penalty = (1−α)*|β| + α*|β|2 where β refers to the coefficient matrix.
  • α = 0 => lasso penalty
  • α = 1 => ridge penalty
  • Naive elastic net can be solved by transforming to lasso on augmeneted data.
  • Can be viewed as redge type shrinkage followed by lasso type thresholding.

Limitations

  • The two-stage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance.

Bridge Regression

  • Generalization of lasso and ridge regression.
  • Can not produce sparse solutions.

Elastic net

  • Rescaled naive elastic net coefficients to undo shrinkage.
  • Retains good properties of the naive elastic net.

Justification for scaling

  • Elastic net becomes minimax optimal.
  • Scaling reverses the shrinkage control introduced by ridge regression.

LARS-EN

  • Based on LARS (used to solve lasso).
  • Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm.
  • Use sparseness to save on computation.

Conclusion

Elastic net performs superior to lasso.

@MohammadMahdiMohammadi
Copy link

Dear Shagun,
I'm looking for closed form formula for Elastic net method for selecting variable.
Do you have code to guide me how to emplement Elastic net in matlab?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment