scispace - formally typeset
Open AccessJournal ArticleDOI

Model Selection for High-Dimensional Quadratic Regression via Regularization

Reads0
Chats0
TLDR
Wang et al. as mentioned in this paper proposed two-stage regularization methods for model selection in high-dimensional quadratic regression (QR) models, which maintain the hierarchical model structure between main effects and interaction effects.
Abstract
Quadratic regression (QR) models naturally extend linear models by considering interaction effects between the covariates. To conduct model selection in QR, it is important to maintain the hierarchical model structure between main effects and interaction effects. Existing regularization methods generally achieve this goal by solving complex optimization problems, which usually demands high computational cost and hence are not feasible for high-dimensional data. This article focuses on scalable regularization methods for model selection in high-dimensional QR. We first consider two-stage regularization methods and establish theoretical properties of the two-stage LASSO. Then, a new regularization method, called regularization algorithm under marginality principle (RAMP), is proposed to compute a hierarchy-preserving regularization solution path efficiently. Both methods are further extended to solve generalized QR models. Numerical results are also shown to demonstrate performance of the methods. S...

read more

Content maybe subject to copyright    Report

Model Selection for High-Dimensional
Quadratic Regression via Regularization
Item Type Article
Authors Hao, Ning; Feng, Yang; Zhang, Hao Helen
Citation Ning Hao, Yang Feng & Hao Helen Zhang (2018) Model Selection
for High-Dimensional Quadratic Regression via Regularization,
Journal of the American Statistical Association, 113:522, 615-625,
DOI: 10.1080/01621459.2016.1264956
DOI 10.1080/01621459.2016.1264956
Publisher AMER STATISTICAL ASSOC
Journal JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
Rights © 2018 American Statistical Association.
Download date 09/08/2022 21:46:02
Item License http://rightsstatements.org/vocab/InC/1.0/
Version Final accepted manuscript
Link to Item http://hdl.handle.net/10150/628664

Model Selection for High Dimensional Quadratic
Regression via Regularization
Ning Hao, Yang Feng, and Hao Helen Zhang
Abstract
Quadratic regression (QR) models naturally extend linear models by considering
interaction effects between the covariates. To conduct model selection in QR, it is
important to maintain the hierarchical model structure between main effects and in-
teraction effects. Existing regularization methods generally achieve this goal by solving
complex optimization problems, which usually demands high computational cost and
hence are not feasible for high dimensional data. This paper focuses on scalable regular-
ization methods for model selection in high dimensional QR. We first consider two-stage
regularization methods and establish theoretical properties of the two-stage LASSO.
Then, a new regularization method, called Regularization Algorithm under Marginal-
ity Principle (RAMP), is proposed to compute a hierarchy-preserving regularization
solution path efficiently. Both methods are further extended to solve generalized QR
models. Numerical results are also shown to demonstrate performance of the methods.
Keywords: Generalized quadratic regression, Interaction selection, LASSO, Marginality
principle, Variable selection.
Ning Hao is Assistant Professor, Department of Mathematics, University of Arizona, Tucson, AZ 85721
(Email: nhao@math.arizona.edu). Yang Feng is Associate Professor, Department of Statistics, Columbia
University, New York, NY 10027 (E-mail: yangfeng@stat.columbia.edu). Hao Helen Zhang is Professor,
Department of Mathematics, University of Arizona, Tucson, AZ 85721 (Email: hzhang@math.arizona.edu).
Ning Hao and Yang Feng contribute equally to this work. The authors are partially supported by NSF
grants DMS-1309507 (Hao and Zhang), DMS-1308566 and DMS-1554804 (Feng), DMS-1418172 and NSFC
11571009 (Zhang). The authors are grateful to the editor, AE, and referees for their helpful suggestions.
1

1 Introduction
Statistical models involving two-way or higher-order interactions have been studied in
various contexts, such as linear models and generalized linear models (Nelder, 1977; McCul-
lagh & Nelder, 1989), experimental design (Hamada & Wu, 1992; Chipman et al., 1997),
and polynomial regression (Peixoto, 1987). In particular, a quadratic regression (QR) model
formulated as
Y = β
0
+ β
1
X
1
+ ···+ β
p
X
p
+ β
1,1
X
2
1
+ β
1,2
X
1
X
2
+ ···+ β
p,p
X
2
p
+ ε (1)
has been considered recently to analyze high dimensional data. In (1), X
1
,..., X
p
are main
effects, and order-2 terms X
j
X
k
(1 j k p) include quadratic main effects (j = k) and
two-way interaction effects (j 6= k). A key feature of model (1) is its hierarchical structure,
as order-2 terms are derived from the main effects. To reflect their relationship, we call X
j
X
k
the child of X
j
and X
k
, and X
j
and X
k
the parents of X
j
X
k
.
Standard techniques such as ordinary least squares can be applied to solve (1) for a small
or moderate p. When p is large and variable selection becomes necessary, it is suggested that
the selected model should keep the hierarchical structure. That is, interaction terms can be
selected into the model only if their parents are in the model. This is referred to the marginal-
ity principle (Nelder, 1977). In general, a direct application of variable selection techniques
to (1) can not automatically ensure the hierarchical structure in the final model. Recently,
several regularization methods (Zhao et al., 2009; Yuan et al., 2009; Choi et al., 2010; Bien
et al., 2013) have been proposed to conduct variable selection for (1) under the marginal-
ity principle by designing special forms of penalty functions. These methods are feasible
when p is a few hundreds or less, and the resulting estimators have oracle properties when
p = o(n) (Choi et al., 2010). However, when p is much larger, these methods are not feasible
since their implementation requires storing and manipulating the entire O(p
2
) × n design
matrix and solving complex constrained optimization problems. The memory and compu-
tational cost can be extremely high and prohibitive. Very recently, interaction screening for
high-dimensional settings has drawn much attention, and a variety of interaction screening
approaches have been proposed for regression and classification problems, including Hao &
Zhang (2014a), Fan et al. (2015), and Kong et al. (2016). By contrast, the purpose of this
2

work is to develop scalable interaction selection approaches under a penalized framework for
high dimensional data analysis.
In this paper, we study regularization methods on model selection and estimation for
QR and generalized quadratic regression (GQR) models under the marginality principle.
The main focus is the case p n, which is a bottleneck for the existing regularization
methods. We study theoretical properties of a two-stage regularization method based on
the LASSO and propose a new efficient algorithm, RAMP, which produces a hierarchy-
preserving solution path. In contrast to existing regularization methods, these procedures
avoid storing O(p
2
)×n design matrix and sidestep complex constraints and penalties, making
them feasible to analyze data with many variables. In particular, our R package RAMP runs
well on a desktop for data with n = 400 and p = 10
4
and it takes less than 30 seconds (with
CPU 3.4 GHz Intel Core i7 and 32GB memory) to fit the QR model and get the whole solution
path. The main contribution of this paper is threefold. First, we establish a variable selection
consistency result of the two-stage LASSO procedure for QR and offer new insights on stage-
wise selection methods. To our best knowledge, this is the first selection consistency result for
high dimensional QR. Second, the proposed algorithms are computationally efficient and will
make a valuable contribution to interaction selection tools in practice. Third, our methods
are extended to interaction selection in GQR models, which are rarely studied in literature.
We define notations used in the paper. Let X = (x
1
, ..., x
n
)
>
be the n ×p design matrix
of main effects and y = (y
1
, ..., y
n
)
>
be the n-dimensional response vector. The linear term
index set is M = {1, 2, ..., p}, and the order-2 index set is I = {(j, k) : 1 j k p}.
The regression coefficient vector β = (β
0
, β
>
M
, β
>
I
)
>
, where β
M
= (β
1
, ..., β
p
)
>
and β
I
=
(β
1,1
, β
1,2
, ..., β
p,p
)
>
. For a subset A M, use β
A
for the subvector of β
M
indexed in A, and
X
A
for the submatrix of X whose columns are indexed in A. In particular, X
j
is the jth
column of X. We treat the subscripts (j, k) and (k, j) as identical, i.e., β
j,k
= β
k,j
. Let c
1
, c
2
,
... and C
1
, C
2
, ... be positive constants which are independent of the sample size n. They are
locally defined and their values may vary in different context. For a vector v = (v
1
, ..., v
p
)
>
,
kvk =
q
P
p
j=1
v
2
j
and kvk
1
=
P
p
j=1
|v
j
|. For a matrix A, define kAk
= max
i
P
j
|A
ij
| and
kAk
2
= sup
kvk
2
=1
kAvk
2
as the standard operator norm, i.e., the square root of the largest
eigenvalue of A
>
A.
3

The rest of the paper is organized as follows. Section 2 considers two-stage regulariza-
tion methods for model selection in QR and studies theoretical properties of the two-stage
LASSO. Section 3 proposes RAMP to compute the entire hierarchy-preserving solution path
efficiently. Section 4 extends the proposed methods to generalized QR models. Section 5
presents numerical studies, followed by a discussion. Technical proofs are in the Appendix.
2 Two-stage Regularization Method
Variable selection and estimation via penalization is popular in high dimensional analy-
sis. Examples include the LASSO (Tibshirani, 1996), SCAD (Fan & Li, 2001), elastic net
(Zou & Hastie, 2005), minimax concave penalty (MCP) (Zhang, 2010), among many others.
Properties such as model selection consistency and oracle properties have been verified (Zhao
& Yu, 2006; Wainwright, 2009; Fan & Lv, 2011). A general penalized estimator for linear
models is defined as
(
ˆ
β
0
,
ˆ
β
M
) = argmin
(β
0
,β
M
)
1
2n
ky 1β
0
Xβ
M
k
2
+
p
X
j=1
J
λ
(β
j
), (2)
where y is the response vector, X is the design matrix, J
λ
(·) is a penalty function, and λ 0
is a regularization parameter. The penalty J(·) and λ may depend on index j. For easy
presentation, we use same penalty function and parameter for all j unless stated otherwise.
We consider the problem of variable selection for QR model (1). Define X
2
= X X as
an n×
p(p+1)
2
matrix consisting of all pairwise column products. That is, for X = (X
1
, ..., X
p
),
X
2
= XX = (X
1
?X
1
, X
1
?X
2
, ..., X
p
?X
p
), where ? denotes the entry-wise product of two
column vectors. For an index set A M, define A
2
= A◦A = {(j, k) : j k; j, k A} I,
and A M = {(j, k) : j k; j or k A} I. We use X
2
A
as a short notation for (X
A
)
2
,
a matrix whose columns are indexed by A
2
.
Two-stage regularization methods for interaction selection have been considered in Efron
et al. (2004); Wu et al. (2009), among others. However, their theoretical properties are not
clearly understood. In the following, we first illustrate the general two-stage procedure for
interaction selection.
4

Citations
More filters
Journal ArticleDOI

Forecast of hourly global horizontal irradiance based on structured Kernel Support Vector Machine: A case study of Tibet area in China

TL;DR: This paper investigates structural variable selection in Kernel SVM based approach which pursues heredity principle and sparsity simultaneously and derives fast and simple-to-implement algorithms to perform structuralVariable selection and solar irradiance forecasting.
Journal ArticleDOI

A Note on High-Dimensional Linear Regression With Interactions

TL;DR: This note addresses and clarify several fundamental issues in interaction selection for linear regression models, especially when the input dimension p is much larger than the sample size n, and suggests new strategies for interaction selection under the marginality principle.
Journal ArticleDOI

Bayesian Factor Analysis for Inference on Interactions

TL;DR: A latent factor joint model is proposed, which includes shared factors in both the predictor and response components while assuming conditional independence, and a Bayesian approach to inference is proposed under this Factor analysis for INteractions (FIN) framework.
Posted Content

Interaction Pursuit with Feature Screening and Selection

TL;DR: This paper proposes an efficient and flexible procedure, called the interaction pursuit (IP), for interaction identification in ultra-high dimensions, and establishes that for both interactions and main effects, the method enjoys the sure screening property in screening and oracle inequalities in selection.
Journal ArticleDOI

Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods

TL;DR: 37 new methods from PRIME projects are reviewed and summarized to enable more informed analyses of environmental mixtures and stress training for early career scientists as well as innovation in statistical methodology as an ongoing need.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Book

Generalized Linear Models

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Journal ArticleDOI

Regularization and variable selection via the elastic net

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Journal ArticleDOI

Regularization Paths for Generalized Linear Models via Coordinate Descent

TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Journal ArticleDOI

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

TL;DR: In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.
Related Papers (5)