scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Penalized Regressions: The Bridge versus the Lasso

01 Sep 1998-Journal of Computational and Graphical Statistics (Taylor & Francis Group)-Vol. 7, Iss: 3, pp 397-416
TL;DR: It is shown that the bridge regression performs well compared to the lasso and ridge regression, and is demonstrated through an analysis of a prostate cancer data.
Abstract: Bridge regression, a special family of penalized regressions of a penalty function Σ|βj|γ with γ ≤ 1, considered. A general approach to solve for the bridge estimator is developed. A new algorithm for the lasso (γ = 1) is obtained by studying the structure of the bridge estimators. The shrinkage parameter γ and the tuning parameter λ are selected via generalized cross-validation (GCV). Comparison between the bridge model (γ ≤ 1) and several other shrinkage models, namely the ordinary least squares regression (λ = 0), the lasso (γ = 1) and ridge regression (γ = 2), is made through a simulation study. It is shown that the bridge regression performs well compared to the lasso and ridge regression. These methods are demonstrated through an analysis of a prostate cancer data. Some computational advantages and limitations are discussed.
Citations
More filters
Journal ArticleDOI
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538 citations


Cites background from "Penalized Regressions: The Bridge v..."

  • ...Bayesian connections and the Lq-penalty Bridge regression (Frank and Friedman, 1993; Fu, 1998) has J....

    [...]

  • ...Tibshirani (1996) and Fu (1998) compared the prediction performance of the lasso, ridge and bridge regression (Frank and Friedman, 1993) and found that none of them uniformly dominates the other two....

    [...]

  • ...Bayesian connections and the Lq-penalty Bridge regression (Frank and Friedman, 1993; Fu, 1998) has J.β/=|β|qq =Σpj=1 |βj|q in equation (7), which is a generalization of both the lasso (q = 1) and ridge regression (q = 2)....

    [...]

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations


Cites background from "Penalized Regressions: The Bridge v..."

  • ...Early references include Fu (1998) , Shevade and Keerthi (2003) and Daubechies et al. (2004)....

    [...]

  • ...Early references include Fu (1998), Shevade and Keerthi (2003) and Daubechies et al. (2004)....

    [...]

Journal ArticleDOI
TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Abstract: Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of ...

8,314 citations


Cites methods from "Penalized Regressions: The Bridge v..."

  • ...Tibshirani (1996) proposed an algorithm for solving constrained least squares problems of LASSO, whereas Fu (1998) provided a “shooting algorithm” for LASSO....

    [...]

  • ...The Lq penalty p‹4—ˆ—5D‹—ˆ—q leads to a bridge regression (Frank and Friedman 1993 and Fu 1998 )....

    [...]

  • ...In all examples in this section, we computed the penalized likelihood estimate with the L1 penalty, referred as to LASSO, by our algorithm rather than those of Tibshirani (1996) and Fu (1998) ....

    [...]

  • ...Here we discuss two methods of estimating ˆ: vefold cross-validation and generalized crossvalidation, as suggested by Breiman (1995), Tibshirani (1996), and Fu (1998)....

    [...]

  • ...In all examples in this section, we computed the penalized likelihood estimate with the L1 penalty, referred as to LASSO, by our algorithm rather than those of Tibshirani (1996) and Fu (1998)....

    [...]

Book
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

8,059 citations


Cites methods from "Penalized Regressions: The Bridge v..."

  • ...The coordinate descent method is particularly appealing if each one-dimensional optimization problem can be solved analytically For example, the shooting algorithm (Fu 1998; Wu and Lange 2008) for lasso uses Equation 13....

    [...]

Journal ArticleDOI
TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.
Abstract: Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

7,400 citations


Cites methods from "Penalized Regressions: The Bridge v..."

  • ...Our implementation of the group lasso is an extension of the shooting algorithm (Fu, 1999) for the lasso....

    [...]

  • ...It can be easily verified that the solution to expressions (2.2) and (2.3) is βj = ( 1− λ √ pj ‖Sj‖ ) + Sj, .2:4/ where Sj =X′j.Y −Xβ−j/, with β−j = .β′1, . . . , β′j−1, 0′, β′j+1, . . . , β′J /....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"Penalized Regressions: The Bridge v..." refers background or methods or result in this paper

  • ...The effective number of parameters defined here has an extra compensation term no for the lasso (y = 1) compared to the one in Tibshirani (1996). It also generalizes to accommodate for bridge regression with any 7 > 1....

    [...]

  • ...It also agrees with the results obtained by Tibshirani (1996) through intensive simulations....

    [...]

  • ...In contrast, the combined quadratic programming method by Tibshirani (1996) has a finite-step (2P) convergence, and potentially has even better convergence rate (....

    [...]

  • ...Tibshirani (1996) introduced the lasso, which minimizes RSS subject to a constraint I3j < t, as a special case of the bridge with y = 1....

    [...]

  • ...This technique is borrowed here to select the shrinkage parameters A and ', as suggested by Tibshirani (1996) for the lasso....

    [...]

Book
01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

37,183 citations

Journal ArticleDOI
TL;DR: In this paper, an estimation procedure based on adding small positive quantities to the diagonal of X′X was proposed, which is a method for showing in two dimensions the effects of nonorthogonality.
Abstract: In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are not orthogonal. Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality. It is then shown how to augment X′X to obtain biased estimates with smaller mean square error.

8,091 citations


"Penalized Regressions: The Bridge v..." refers background in this paper

  • ...Detailed discussions can be found in Seber (1977), Sen and Srivastava (1990), Lawson and Hansen (1974), Hoerl and Kennard (1970a, 1970b) and Frank and Friedman (1993). To achieve better prediction, Hoerl and Kennard (1970a, 1970b) introduced ridge regression, which minimizes RSS subject to a constraint C I/3jI2 5 t....

    [...]

  • ...Detailed discussions can be found in Seber (1977), Sen and Srivastava (1990), Lawson and Hansen (1974), Hoerl and Kennard (1970a, 1970b) and Frank and Friedman (1993)....

    [...]

Book
01 Jun 1974
TL;DR: Since the lm function provides a lot of features it is rather complicated so it is going to instead use the function lsfit as a model, which computes only the coefficient estimates and the residuals.
Abstract: Since the lm function provides a lot of features it is rather complicated. So we are going to instead use the function lsfit as a model. It computes only the coefficient estimates and the residuals. Now would be a good time to read the help file for lsfit. Note that lsfit supports the fitting of multiple least squares models and weighted least squares. Our function will not, hence we can omit the arguments wt, weights and yname. Also, changing tolerances is a little advanced so we will trust the default values and omit the argument tolerance as well.

6,956 citations

Journal ArticleDOI
TL;DR: Statistical theory attacks the problem from both ends as discussed by the authors, and provides optimal methods for finding a real signal in a noisy background, and also provides strict checks against the overinterpretation of random patterns.
Abstract: Statistics is the science of learning from experience, especially experience that arrives a little bit at a time. The earliest information science was statistics, originating in about 1650. This century has seen statistical techniques become the analytic methods of choice in biomedical science, psychology, education, economics, communications theory, sociology, genetic studies, epidemiology, and other areas. Recently, traditional sciences like geology, physics, and astronomy have begun to make increasing use of statistical methods as they focus on areas that demand informational efficiency, such as the study of rare and exotic particles or extremely distant galaxies. Most people are not natural-born statisticians. Left to our own devices we are not very good at picking out patterns from a sea of noisy data. To put it another way, we are all too good at picking out non-existent patterns that happen to suit our purposes. Statistical theory attacks the problem from both ends. It provides optimal methods for finding a real signal in a noisy background, and also provides strict checks against the overinterpretation of random patterns.

6,361 citations


"Penalized Regressions: The Bridge v..." refers methods in this paper

  • ...The standard errors for the bridge estimates were computed by 10,000 bootstrap samples (Efron and Tibshirani 1993)....

    [...]