Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

/pdf/regularization-and-variable-selection-via-the-elastic-net-4z8q6j5o0i.pdf

Regularization and variable selection via the elastic net

We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

/pdf/regularization-paths-for-generalized-linear-models-via-39r1t606ac.pdf

Regularization Paths for Generalized Linear Models via Coordinate Descent

Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of ...

/pdf/variable-selection-via-nonconcave-penalized-likelihood-and-ay6pzm02g6.pdf

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

计量经济分析 = Econometric analysis

Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

Machine Learning : A Probabilistic Perspective

Data-analytic approaches to regression problems, arising from many scientific disciplines are described in this book. The aim of these nonparametric methods is to relax assumptions on the form of a regression function and to let data search for a suitable function that describes the data well. The use of these nonparametric functions with parametric techniques can yield very powerful data analysis tools. Local polynomial modeling and its applications provides an up-to-date picture on state-of-the-art nonparametric regression techniques. The emphasis of the book is on methodologies rather than on theory, with a particular focus on applications of nonparametric techniques to various statistical problems. High-dimensional data-analytic tools are presented, and the book includes a variety of examples. This will be a valuable reference for research and applied statisticians, and will serve as a textbook for graduate students and others interested in nonparametric regression.

/pdf/local-polynomial-modelling-and-its-applications-1q0qqgxfd8.pdf

Local polynomial modelling and its applications

Summary. Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor  log (p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor  log (p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

/pdf/sure-independence-screening-for-ultrahigh-dimensional-3pasjczpn0.pdf

Sure independence screening for ultrahigh dimensional feature space

Variable selection plays an important role in high dimensional statistical modeling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality $p$, estimation accuracy and computational cost are two top concerns. In a recent paper, Candes and Tao (2007) propose the Dantzig selector using $L_1$ regularization and show that it achieves the ideal risk up to a logarithmic factor $\log p$. Their innovative procedure and remarkable result are challenged when the dimensionality is ultra high as the factor $\log p$ can be large and their uniform uncertainty principle can fail. 
Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method based on a correlation learning, called the Sure Independence Screening (SIS), to reduce dimensionality from high to a moderate scale that is below sample size. In a fairly general asymptotic framework, the correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, an iterative SIS (ISIS) is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as the SCAD, Dantzig selector, Lasso, or adaptive Lasso. The connections of these penalized least-squares methods are also elucidated.

Sure Independence Screening for Ultra-High Dimensional Feature Space

In this article we study the method of nonparametric regression based on a weighted local linear regression. This method has advantages over other popular kernel methods. Moreover, such a regression procedure has the ability of design adaptation: It adapts to both random and fixed designs, to both highly clustered and nearly uniform designs, and even to both interior and boundary points. It is shown that the local linear regression smoothers have high asymptotic efficiency (i.e., can be 100% with a suitable choice of kernel and bandwidth) among all possible linear smoothers, including those produced by kernel, orthogonal series, and spline methods. The finite sample property of the local linear regression smoother is illustrated via simulation studies. Nonparametric regression is frequently used to explore the association between covariates and responses. There are many versions of kernel regression smoothers. Some estimators are not good for random designs, such as in observational studies, and ...

/pdf/design-adaptive-nonparametric-regression-2slys38t62.pdf

Jianqing Fan

Papers

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Local polynomial modelling and its applications

Sure independence screening for ultrahigh dimensional feature space

Sure Independence Screening for Ultra-High Dimensional Feature Space

Design-adaptive Nonparametric Regression