Model Selection for High-Dimensional Quadratic Regression via Regularization

doi:10.1080/01621459.2016.1264956

Model Selection for High-Dimensional

Quadratic Regression via Regularization

Item Type Article

Authors Hao, Ning; Feng, Yang; Zhang, Hao Helen

Citation Ning Hao, Yang Feng & Hao Helen Zhang (2018) Model Selection

for High-Dimensional Quadratic Regression via Regularization,

Journal of the American Statistical Association, 113:522, 615-625,

DOI: 10.1080/01621459.2016.1264956

DOI 10.1080/01621459.2016.1264956

Publisher AMER STATISTICAL ASSOC

Journal JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

Download date 09/08/2022 21:46:02

Item License http://rightsstatements.org/vocab/InC/1.0/

Version Final accepted manuscript

Link to Item http://hdl.handle.net/10150/628664

Model Selection for High Dimensional Quadratic

Regression via Regularization

Ning Hao, Yang Feng, and Hao Helen Zhang

∗

Abstract

Quadratic regression (QR) models naturally extend linear models by considering

interaction eﬀects between the covariates. To conduct model selection in QR, it is

important to maintain the hierarchical model structure between main eﬀects and in-

teraction eﬀects. Existing regularization methods generally achieve this goal by solving

complex optimization problems, which usually demands high computational cost and

hence are not feasible for high dimensional data. This paper focuses on scalable regular-

ization methods for model selection in high dimensional QR. We ﬁrst consider two-stage

regularization methods and establish theoretical properties of the two-stage LASSO.

Then, a new regularization method, called Regularization Algorithm under Marginal-

ity Principle (RAMP), is proposed to compute a hierarchy-preserving regularization

solution path eﬃciently. Both methods are further extended to solve generalized QR

models. Numerical results are also shown to demonstrate performance of the methods.

Keywords: Generalized quadratic regression, Interaction selection, LASSO, Marginality

principle, Variable selection.

∗

Ning Hao is Assistant Professor, Department of Mathematics, University of Arizona, Tucson, AZ 85721

(Email: nhao@math.arizona.edu). Yang Feng is Associate Professor, Department of Statistics, Columbia

University, New York, NY 10027 (E-mail: yangfeng@stat.columbia.edu). Hao Helen Zhang is Professor,

Department of Mathematics, University of Arizona, Tucson, AZ 85721 (Email: hzhang@math.arizona.edu).

Ning Hao and Yang Feng contribute equally to this work. The authors are partially supported by NSF

grants DMS-1309507 (Hao and Zhang), DMS-1308566 and DMS-1554804 (Feng), DMS-1418172 and NSFC

11571009 (Zhang). The authors are grateful to the editor, AE, and referees for their helpful suggestions.

1

1 Introduction

Statistical models involving two-way or higher-order interactions have been studied in

various contexts, such as linear models and generalized linear models (Nelder, 1977; McCul-

lagh & Nelder, 1989), experimental design (Hamada & Wu, 1992; Chipman et al., 1997),

and polynomial regression (Peixoto, 1987). In particular, a quadratic regression (QR) model

formulated as

Y = β

0

+ β

1

X

1

+ ···+ β

p

X

p

+ β

1,1

X

2

1

+ β

1,2

X

1

X

2

+ ···+ β

p,p

X

2

p

+ ε (1)

has been considered recently to analyze high dimensional data. In (1), X

1

,..., X

p

are main

eﬀects, and order-2 terms X

j

X

k

(1 ≤ j ≤ k ≤ p) include quadratic main eﬀects (j = k) and

two-way interaction eﬀects (j 6= k). A key feature of model (1) is its hierarchical structure,

as order-2 terms are derived from the main eﬀects. To reﬂect their relationship, we call X

j

X

k

the child of X

j

and X

k

, and X

j

and X

k

the parents of X

j

X

k

.

Standard techniques such as ordinary least squares can be applied to solve (1) for a small

or moderate p. When p is large and variable selection becomes necessary, it is suggested that

the selected model should keep the hierarchical structure. That is, interaction terms can be

selected into the model only if their parents are in the model. This is referred to the marginal-

ity principle (Nelder, 1977). In general, a direct application of variable selection techniques

to (1) can not automatically ensure the hierarchical structure in the ﬁnal model. Recently,

several regularization methods (Zhao et al., 2009; Yuan et al., 2009; Choi et al., 2010; Bien

et al., 2013) have been proposed to conduct variable selection for (1) under the marginal-

ity principle by designing special forms of penalty functions. These methods are feasible

when p is a few hundreds or less, and the resulting estimators have oracle properties when

p = o(n) (Choi et al., 2010). However, when p is much larger, these methods are not feasible

since their implementation requires storing and manipulating the entire O(p

2

) × n design

matrix and solving complex constrained optimization problems. The memory and compu-

tational cost can be extremely high and prohibitive. Very recently, interaction screening for

high-dimensional settings has drawn much attention, and a variety of interaction screening

approaches have been proposed for regression and classiﬁcation problems, including Hao &

Zhang (2014a), Fan et al. (2015), and Kong et al. (2016). By contrast, the purpose of this

2

work is to develop scalable interaction selection approaches under a penalized framework for

high dimensional data analysis.

In this paper, we study regularization methods on model selection and estimation for

QR and generalized quadratic regression (GQR) models under the marginality principle.

The main focus is the case p  n, which is a bottleneck for the existing regularization

methods. We study theoretical properties of a two-stage regularization method based on

the LASSO and propose a new eﬃcient algorithm, RAMP, which produces a hierarchy-

preserving solution path. In contrast to existing regularization methods, these procedures

avoid storing O(p

2

)×n design matrix and sidestep complex constraints and penalties, making

them feasible to analyze data with many variables. In particular, our R package RAMP runs

well on a desktop for data with n = 400 and p = 10

4

and it takes less than 30 seconds (with

CPU 3.4 GHz Intel Core i7 and 32GB memory) to ﬁt the QR model and get the whole solution

path. The main contribution of this paper is threefold. First, we establish a variable selection

consistency result of the two-stage LASSO procedure for QR and oﬀer new insights on stage-

wise selection methods. To our best knowledge, this is the ﬁrst selection consistency result for

high dimensional QR. Second, the proposed algorithms are computationally eﬃcient and will

make a valuable contribution to interaction selection tools in practice. Third, our methods

are extended to interaction selection in GQR models, which are rarely studied in literature.

We deﬁne notations used in the paper. Let X = (x

1

, ..., x

n

)

>

be the n ×p design matrix

of main eﬀects and y = (y

1

, ..., y

n

)

>

be the n-dimensional response vector. The linear term

index set is M = {1, 2, ..., p}, and the order-2 index set is I = {(j, k) : 1 ≤ j ≤ k ≤ p}.

The regression coeﬃcient vector β = (β

0

, β

>

M

, β

>

I

)

>

, where β

M

= (β

1

, ..., β

p

)

>

and β

I

=

(β

1,1

, β

1,2

, ..., β

p,p

)

>

. For a subset A ⊂ M, use β

A

for the subvector of β

M

indexed in A, and

X

A

for the submatrix of X whose columns are indexed in A. In particular, X

j

is the jth

column of X. We treat the subscripts (j, k) and (k, j) as identical, i.e., β

j,k

= β

k,j

. Let c

1

, c

2

,

... and C

1

, C

2

, ... be positive constants which are independent of the sample size n. They are

locally deﬁned and their values may vary in diﬀerent context. For a vector v = (v

1

, ..., v

p

)

>

,

kvk =

q

P

p

j=1

v

2

j

and kvk

1

=

P

p

j=1

|v

j

|. For a matrix A, deﬁne kAk

∞

= max

i

P

j

|A

ij

| and

kAk

2

= sup

kvk

2

=1

kAvk

2

as the standard operator norm, i.e., the square root of the largest

eigenvalue of A

>

A.

3

The rest of the paper is organized as follows. Section 2 considers two-stage regulariza-

tion methods for model selection in QR and studies theoretical properties of the two-stage

LASSO. Section 3 proposes RAMP to compute the entire hierarchy-preserving solution path

eﬃciently. Section 4 extends the proposed methods to generalized QR models. Section 5

presents numerical studies, followed by a discussion. Technical proofs are in the Appendix.

2 Two-stage Regularization Method

Variable selection and estimation via penalization is popular in high dimensional analy-

sis. Examples include the LASSO (Tibshirani, 1996), SCAD (Fan & Li, 2001), elastic net

(Zou & Hastie, 2005), minimax concave penalty (MCP) (Zhang, 2010), among many others.

Properties such as model selection consistency and oracle properties have been veriﬁed (Zhao

& Yu, 2006; Wainwright, 2009; Fan & Lv, 2011). A general penalized estimator for linear

models is deﬁned as

(

ˆ

β

0

,

ˆ

β

M

) = argmin

(β

0

,β

M

)

1

2n

ky − 1β

0

− Xβ

M

k

2

+

p

X

j=1

J

λ

(β

j

), (2)

where y is the response vector, X is the design matrix, J

λ

(·) is a penalty function, and λ ≥ 0

is a regularization parameter. The penalty J(·) and λ may depend on index j. For easy

presentation, we use same penalty function and parameter for all j unless stated otherwise.

We consider the problem of variable selection for QR model (1). Deﬁne X

◦2

= X ◦ X as

an n×

p(p+1)

2

matrix consisting of all pairwise column products. That is, for X = (X

1

, ..., X

p

),

X

◦2

= X◦X = (X

1

?X

1

, X

1

?X

2

, ..., X

p

?X

p

), where ? denotes the entry-wise product of two

column vectors. For an index set A ⊂ M, deﬁne A

◦2

= A◦A = {(j, k) : j ≤ k; j, k ∈ A} ⊂ I,

and A ◦ M = {(j, k) : j ≤ k; j or k ∈ A} ⊂ I. We use X

◦2

A

as a short notation for (X

A

)

◦2

,

a matrix whose columns are indexed by A

◦2

.

Two-stage regularization methods for interaction selection have been considered in Efron

et al. (2004); Wu et al. (2009), among others. However, their theoretical properties are not

clearly understood. In the following, we ﬁrst illustrate the general two-stage procedure for

interaction selection.

4

Model Selection for High-Dimensional Quadratic Regression via Regularization

Figures

Citations

Forecast of hourly global horizontal irradiance based on structured Kernel Support Vector Machine: A case study of Tibet area in China

A Note on High-Dimensional Linear Regression With Interactions

Bayesian Factor Analysis for Inference on Interactions

Interaction Pursuit with Feature Screening and Selection

Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) Program: Novel and Expanded Statistical Methods

References

Regression Shrinkage and Selection via the Lasso

Generalized Linear Models

Regularization and variable selection via the elastic net

Regularization Paths for Generalized Linear Models via Coordinate Descent

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Related Papers (5)

Regression Shrinkage and Selection via the Lasso

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Detecting gene-gene interactions that underlie human diseases

The adaptive lasso and its oracle properties

The composite absolute penalties family for grouped and hierarchical variable selection