Strong rules for discarding predictors in lasso-type problems

Open AccessPosted Content

Strong rules for discarding predictors in lasso-type problems

Robert Tibshirani, +6 more

- 09 Nov 2010 -

arXiv: Statistics Theory

Chats0

TLDR

In this paper, the authors propose strong rules for discarding predictors in lasso regression and related problems, for computational efficiency, complemented with simple checks of the Karush- Kuhn-Tucker (KKT) conditions.

Abstract:

We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui et al (2010) propose "SAFE" rules that guarantee that a coefficient will be zero in the solution, based on the inner products of each predictor with the outcome. In this paper we propose strong rules that are not foolproof but rarely fail in practice. These can be complemented with simple checks of the Karush- Kuhn-Tucker (KKT) conditions to provide safe rules that offer substantial speed and space savings in a variety of statistical convex optimization problems.

Citations

PDF

Open Access

More filters

Dissertation

Accelerating sparse inverse problems using structured approximations

Cassio Fraga Dantas

TL;DR: In this paper, a particular family of dictionaries, written as a sum of Kronecker products, is proposed, and stable screening tests are developed to safely identify and discard useless atoms (columns of the dictionary matrix which do not correspond to the solution support).

...read moreread less

Dissertation

Genetic risk score based on statistical learning

Florian Privé

TL;DR: In this paper, the authors used ex-treme gradient boosting for imputing genotyped variants, feature engineering to cap-ture recessive and dominant effects in penalized regression, and parameter tuning and stacked regressions to improve polygenic prediction.

...read moreread less

Machine Learning on Graphs

David Jeremy Eis

TL;DR: The contribution of this thesis is the derivation of the deterministic variational inference update equations for doing inference on the SHDPHMM, an improvement over the Markov Chain Monte Carlo algorithm proposed by Fox as it allows for direct assessment of convergence and can run faster.

...read moreread less

Dissertation

Computational Curation of Open Science Data

Maxim Grechkin

TL;DR: Computational Curation of Open Science Data is presented as a probabilistic procedure to estimate the probability that a particular type of data will be chosen for an particular science research project.

...read moreread less