scispace - formally typeset
Open AccessJournal Article

A Selective Overview of Variable Selection in High Dimensional Feature Space.

TLDR
In this paper, a brief account of the recent developments of theory, methods, and implementations for high-dimensional variable selection is presented, with emphasis on independence screening and two-scale methods.
Abstract
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Confidence intervals for low dimensional parameters in high dimensional linear models

TL;DR: In this article, the authors proposed a method to construct confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model by turning the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients.
Journal ArticleDOI

On asymptotically optimal confidence regions and tests for high-dimensional models

TL;DR: A general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model and develops the corresponding theory which includes a careful analysis for Gaussian, sub-Gaussian and bounded correlated designs.
BookDOI

Simultaneous Statistical Inference

TL;DR: A variety of classical and modern type I and type II error rates in multiple hypotheses testing are defined, some relationships between them are analyzed, and different ways to cope with structured systems of hypotheses are considered.
Journal ArticleDOI

On asymptotically optimal confidence regions and tests for high-dimensional models

TL;DR: In this paper, a general method for constructing confidence intervals and statistical tests for single or low-dimensional components of a large parameter vector in a high-dimensional model is proposed, which can be easily adjusted for multiplicity taking dependence among tests into account.
Journal ArticleDOI

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

TL;DR: Simulations show excellent agreement with the high-dimensional scaling of the error predicted by the theory, and illustrate their consequences for a number of specific learning models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes, and recovery of low- rank matrices from random projections.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal ArticleDOI

A new look at the statistical model identification

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI

Estimating the Dimension of a Model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Related Papers (5)