shagunsodhani/ElasticNet.md

## ElasticNet.md

      
    Raw
  

              ElasticNet.md
            
          
    Regularization and variable selection via the elastic net

Introduction to elastic net


Regularization and variable selection method.
Sparse Representation
Exihibits grouping effect.
Prticulary useful when number of predictors (p) >> number of observations (n).
LARS-EN algorithm to compute elastic net regularization path.
Link to paper.

Lasso


Least square method with L1-penalty on regression coefficient.
Does continuous shrinkage and automatic variable selection

Limitations


If p >> n, lasso can select at most n variables.
In the case of a group of variables exhibiting high pairwise correlation, lasso doesn't care about which variable is selected.
If n > p and there is a high correlation between predictors, ridge regression outperforms lasso.

Naive elastic net


Least square method.
Penalty on regression cofficients is a convex combination of lasso and ridge penalty.
penalty = (1−α)*|β| + α*|β|² where β refers to the coefficient matrix.
α = 0 => lasso penalty
α = 1 => ridge penalty
Naive elastic net can be solved by transforming to lasso on augmeneted data.
Can be viewed as redge type shrinkage followed by lasso type thresholding.

Limitations


The two-stage procedure incurs double amount of shrinkage and introduces extra bias without reducing variance.

Bridge Regression


Generalization of lasso and ridge regression.
Can not produce sparse solutions.

Elastic net


Rescaled naive elastic net coefficients to undo shrinkage.
Retains good properties of the naive elastic net.

Justification for scaling


Elastic net becomes minimax optimal.
Scaling reverses the shrinkage control introduced by ridge regression.

LARS-EN


Based on LARS (used to solve lasso).
Elastic net can be transformed to lasso on augmented data so can reuse pieces of LARS algorithm.
Use sparseness to save on computation.

Conclusion

Elastic net performs superior to lasso.