False Discoveries Occur Early on the Lasso Path

doi:10.1214/16-AOS1521

Open AccessJournal ArticleDOI

False Discoveries Occur Early on the Lasso Path

Weijie J. Su, +2 more

- 01 Oct 2017 -

Annals of Statistics

- Vol. 45, Iss: 5, pp 2133-2150

Chats0

TLDR

It is demonstrated that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are.

Abstract:

In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity—meaning that the fraction of variables with a nonvanishing effect tends to a constant, however small—this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.

Citations

PDF

Open Access

More filters

Posted ContentDOI

Accurate prediction of cell composition, age, smoking consumption and infection serostatus based on blood DNA methylation profiles

Jacob Bergstedt, +5 more

- 30 Oct 2018 -

bioRxiv

TL;DR: This study substantially improves predictions of blood cell composition based on methylation profiles, which will be critical in the emerging field of medical epigenomics, by using elastic net regularized and stability selected regression models to predict the circulating levels of 70 blood cell subsets.

...read moreread less

MonographDOI

A Unifying Tutorial on Approximate Message Passing

TL;DR: Approximate Message Passing (AMP) algorithms have become extremely popular in various structured high-dimensional statistical problems and have been extended for use in computer science and machine learning as mentioned in this paper .

...read moreread less

Posted Content

SLOPE for Sparse Linear Regression:Asymptotics and Optimal Regularization

Hong Hu, +1 more

- 27 Mar 2019 -

arXiv: Information Theory

TL;DR: In this article, the SLOPE estimator generalizes LASSO by penalizing different coordinates of the estimate according to their magnitudes, and a computational feasible way to optimally design the regularizing sequences such that the fundamental limits are reached.

...read moreread less

Journal ArticleDOI

Nested model averaging on solution path for high-dimensional linear regression

Yang Feng, +1 more

TL;DR: This work proposes to combine model averaging with regularized estimators (e.g., lasso, elastic net, and Sorted L‐One Penalized Estimation [SLOPE]) on the solution path for high‐dimensional linear regression.

...read moreread less

Proceedings Article

The Complete Lasso Tradeoff Diagram

Hua Wang, +3 more

TL;DR: In this paper, the tradeoff between false discovery rate (FDR) and power in variable selection was studied in a regime of linear sparsity under random designs, and a complete Lasso tradeoff diagram was proposed.

...read moreread less

Citations

Accurate prediction of cell composition, age, smoking consumption and infection serostatus based on blood DNA methylation profiles

A Unifying Tutorial on Approximate Message Passing

SLOPE for Sparse Linear Regression:Asymptotics and Optimal Regularization

Nested model averaging on solution path for high-dimensional linear regression

The Complete Lasso Tradeoff Diagram

References

Regression Shrinkage and Selection via the Lasso

Regularization and variable selection via the elastic net

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Least angle regression

Model selection and estimation in regression with grouped variables

Related Papers (5)

Regression Shrinkage and Selection via the Lasso

Regularization and variable selection via the elastic net

Regularization Paths for Generalized Linear Models via Coordinate Descent

The adaptive lasso and its oracle properties

On Model Selection Consistency of Lasso