Penalized Regressions: The Bridge versus the Lasso

doi:10.1080/10618600.1998.10474784

Home
/
Papers
/
Penalized Regressions: The Bridge versus the Lasso

Journal Article•DOI•

Penalized Regressions: The Bridge versus the Lasso

Wenjiang J. Fu¹•Institutions (1)

Michigan State University¹

01 Sep 1998-Journal of Computational and Graphical Statistics (Taylor & Francis Group)-Vol. 7, Iss: 3, pp 397-416

TL;DR: It is shown that the bridge regression performs well compared to the lasso and ridge regression, and is demonstrated through an analysis of a prostate cancer data.

read less

Abstract: Bridge regression, a special family of penalized regressions of a penalty function Σ|βj|γ with γ ≤ 1, considered. A general approach to solve for the bridge estimator is developed. A new algorithm for the lasso (γ = 1) is obtained by studying the structure of the bridge estimators. The shrinkage parameter γ and the tuning parameter λ are selected via generalized cross-validation (GCV). Comparison between the bridge model (γ ≤ 1) and several other shrinkage models, namely the ordinary least squares regression (λ = 0), the lasso (γ = 1) and ridge regression (γ = 2), is made through a simulation study. It is shown that the bridge regression performs well compared to the lasso and ridge regression. These methods are demonstrated through an analysis of a prostate cancer data. Some computational advantages and limitations are discussed.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Regularization and variable selection via the elastic net

[...]

Hui Zou¹, Trevor Hastie¹•Institutions (1)

Stanford University¹

01 Apr 2005-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

...read moreread less

Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

...read moreread less

16,538 citations

Cites background from "Penalized Regressions: The Bridge v..."

...Bayesian connections and the Lq-penalty Bridge regression (Frank and Friedman, 1993; Fu, 1998) has J....
[...]
...Tibshirani (1996) and Fu (1998) compared the prediction performance of the lasso, ridge and bridge regression (Frank and Friedman, 1993) and found that none of them uniformly dominates the other two....
[...]
...Bayesian connections and the Lq-penalty Bridge regression (Frank and Friedman, 1993; Fu, 1998) has J.β/=|β|qq =Σpj=1 |βj|q in equation (7), which is a generalization of both the lasso (q = 1) and ridge regression (q = 2)....
[...]

Journal Article•DOI•

Regularization Paths for Generalized Linear Models via Coordinate Descent

[...]

Jerome H. Friedman¹, Trevor Hastie¹, Robert Tibshirani•Institutions (1)

Stanford University¹

02 Feb 2010-Journal of Statistical Software

TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.

...read moreread less

Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

...read moreread less

13,656 citations

Cites background from "Penalized Regressions: The Bridge v..."

...Early references include Fu (1998) , Shevade and Keerthi (2003) and Daubechies et al. (2004)....
[...]
...Early references include Fu (1998), Shevade and Keerthi (2003) and Daubechies et al. (2004)....
[...]

Journal Article•DOI•

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

[...]

Jianqing Fan¹, Runze Li¹•Institutions (1)

University of California, Los Angeles¹

01 Dec 2001-Journal of the American Statistical Association

TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

...read moreread less

Abstract: Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of ...

...read moreread less

8,314 citations

Cites methods from "Penalized Regressions: The Bridge v..."

...Tibshirani (1996) proposed an algorithm for solving constrained least squares problems of LASSO, whereas Fu (1998) provided a “shooting algorithm” for LASSO....
[...]
...The Lq penalty p‹4—ˆ—5D‹—ˆ—q leads to a bridge regression (Frank and Friedman 1993 and Fu 1998 )....
[...]
...In all examples in this section, we computed the penalized likelihood estimate with the L1 penalty, referred as to LASSO, by our algorithm rather than those of Tibshirani (1996) and Fu (1998) ....
[...]
...Here we discuss two methods of estimating ˆ: vefold cross-validation and generalized crossvalidation, as suggested by Breiman (1995), Tibshirani (1996), and Fu (1998)....
[...]
...In all examples in this section, we computed the penalized likelihood estimate with the L1 penalty, referred as to LASSO, by our algorithm rather than those of Tibshirani (1996) and Fu (1998)....
[...]

Book•

Machine Learning : A Probabilistic Perspective

[...]

Kevin P. Murphy

24 Aug 2012

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

8,059 citations

Cites methods from "Penalized Regressions: The Bridge v..."

...The coordinate descent method is particularly appealing if each one-dimensional optimization problem can be solved analytically For example, the shooting algorithm (Fu 1998; Wu and Lange 2008) for lasso uses Equation 13....
[...]

Journal Article•DOI•

Model selection and estimation in regression with grouped variables

[...]

Ming Yuan¹, Yi Lin²•Institutions (2)

Georgia Institute of Technology¹, University of Wisconsin-Madison²

01 Feb 2006-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: In this paper, instead of selecting factors by stepwise backward elimination, the authors focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection.

...read moreread less

Abstract: Summary. We consider the problem of selecting grouped variables (factors) for accurate prediction in regression. Such a problem arises naturally in many practical situations with the multifactor analysis-of-variance problem as the most important and well-known example. Instead of selecting factors by stepwise backward elimination, we focus on the accuracy of estimation and consider extensions of the lasso, the LARS algorithm and the non-negative garrotte for factor selection. The lasso, the LARS algorithm and the non-negative garrotte are recently proposed regression methods that can be used to select individual variables. We study and propose efficient algorithms for the extensions of these methods for factor selection and show that these extensions give superior performance to the traditional stepwise backward elimination method in factor selection problems. We study the similarities and the differences between these methods. Simulations and real examples are used to illustrate the methods.

...read moreread less

7,400 citations

Cites methods from "Penalized Regressions: The Bridge v..."

...Our implementation of the group lasso is an extension of the shooting algorithm (Fu, 1999) for the lasso....
[...]
...It can be easily verified that the solution to expressions (2.2) and (2.3) is βj = ( 1− λ √ pj ‖Sj‖ ) + Sj, .2:4/ where Sj =X′j.Y −Xβ−j/, with β−j = .β′1, . . . , β′j−1, 0′, β′j+1, . . . , β′J /....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

"Penalized Regressions: The Bridge v..." refers background or methods or result in this paper

...The effective number of parameters defined here has an extra compensation term no for the lasso (y = 1) compared to the one in Tibshirani (1996). It also generalizes to accommodate for bridge regression with any 7 > 1....
[...]
...It also agrees with the results obtained by Tibshirani (1996) through intensive simulations....
[...]
...In contrast, the combined quadratic programming method by Tibshirani (1996) has a finite-step (2P) convergence, and potentially has even better convergence rate (....
[...]
...Tibshirani (1996) introduced the lasso, which minimizes RSS subject to a constraint I3j < t, as a special case of the bridge with y = 1....
[...]
...This technique is borrowed here to select the shrinkage parameters A and ', as suggested by Tibshirani (1996) for the lasso....
[...]

Book•

An introduction to the bootstrap

[...]

Bradley Efron¹, Robert Tibshirani•Institutions (1)

South Dakota School of Mines and Technology¹

01 Jan 1993

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.

...read moreread less

Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

...read moreread less

37,183 citations

Journal Article•DOI•

Ridge regression: biased estimation for nonorthogonal problems

[...]

Arthur E. Hoerl¹, Robert W. Kennard¹•Institutions (1)

University of Delaware¹

01 Feb 2000-Technometrics

TL;DR: In this paper, an estimation procedure based on adding small positive quantities to the diagonal of X′X was proposed, which is a method for showing in two dimensions the effects of nonorthogonality.

...read moreread less

Abstract: In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are not orthogonal. Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality. It is then shown how to augment X′X to obtain biased estimates with smaller mean square error.

...read moreread less

8,091 citations

"Penalized Regressions: The Bridge v..." refers background in this paper

...Detailed discussions can be found in Seber (1977), Sen and Srivastava (1990), Lawson and Hansen (1974), Hoerl and Kennard (1970a, 1970b) and Frank and Friedman (1993). To achieve better prediction, Hoerl and Kennard (1970a, 1970b) introduced ridge regression, which minimizes RSS subject to a constraint C I/3jI2 5 t....
[...]
...Detailed discussions can be found in Seber (1977), Sen and Srivastava (1990), Lawson and Hansen (1974), Hoerl and Kennard (1970a, 1970b) and Frank and Friedman (1993)....
[...]

Book•

Solving least squares problems

[...]

Charles L. Lawson, Richard J. Hanson

01 Jun 1974

TL;DR: Since the lm function provides a lot of features it is rather complicated so it is going to instead use the function lsfit as a model, which computes only the coefficient estimates and the residuals.

...read moreread less

Abstract: Since the lm function provides a lot of features it is rather complicated. So we are going to instead use the function lsfit as a model. It computes only the coefficient estimates and the residuals. Now would be a good time to read the help file for lsfit. Note that lsfit supports the fitting of multiple least squares models and weighted least squares. Our function will not, hence we can omit the arguments wt, weights and yname. Also, changing tolerances is a little advanced so we will trust the default values and omit the argument tolerance as well.

...read moreread less

6,956 citations

Journal Article•DOI•

An Introduction to the Bootstrap

[...]

Scott D. Grimshaw

01 Aug 1995-Technometrics

TL;DR: Statistical theory attacks the problem from both ends as discussed by the authors, and provides optimal methods for finding a real signal in a noisy background, and also provides strict checks against the overinterpretation of random patterns.

...read moreread less

Abstract: Statistics is the science of learning from experience, especially experience that arrives a little bit at a time. The earliest information science was statistics, originating in about 1650. This century has seen statistical techniques become the analytic methods of choice in biomedical science, psychology, education, economics, communications theory, sociology, genetic studies, epidemiology, and other areas. Recently, traditional sciences like geology, physics, and astronomy have begun to make increasing use of statistical methods as they focus on areas that demand informational efficiency, such as the study of rare and exotic particles or extremely distant galaxies. Most people are not natural-born statisticians. Left to our own devices we are not very good at picking out patterns from a sea of noisy data. To put it another way, we are all too good at picking out non-existent patterns that happen to suit our purposes. Statistical theory attacks the problem from both ends. It provides optimal methods for finding a real signal in a noisy background, and also provides strict checks against the overinterpretation of random patterns.

...read moreread less

6,361 citations