What have the authors stated for future works in "Extended bayesian information criteria for model selection with large model spaces" ?

Development of such a theory will be a topic of future research.

(Open Access) Extended Bayesian information criteria for model selection with large model spaces (2008) | Jiahua Chen

Q: What are the contributions mentioned in the paper "Extended bayesian information criteria for model selection with large model spaces" ?

In this article, the authors re-examine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria.

Q: Who is the professor of statistics at the University of British Columbia?

By JIAHUA CHENDepartment of Statistics, University of British Columbia, Vancouver,British Columbia, V6T 1Z2 Canadajhchen@stat.ubc.caand ZEHUA CHENDepartment of Statistics and Applied Probability, National University of Singapore,Singapore 117546stachenz@nus.edu.sg

Extended Bayesian Information Criteria for Model

Selection with Large Model Spaces

By JIAHUA CHEN

Department of Statistics, University of British Columbia, Vancouver,

British Columbia, V6T 1Z2 Canada

jhchen@stat.ubc.ca

and ZEHUA CHEN

Department of Statistics and Applied Probability, National University of Singapore,

Singapore 117546

stachenz@nus.edu.sg

Summary

The ordinary Bayes information criterion is too liberal for model selection when

the model space is large. In this article, we re-examine the Bayesian paradigm for

model selection and propose an extended family of Bayes information criteria. The

new criteria take into account both the number of unknown parameters and the com-

plexity of the model space. Their consistency is established, in particular allowing the

number of covariates to increase to inﬁnity with the sample size. Their performance

in various situations is evaluated by simulation studies. It is demonstrated that the

extended Bayes information criteria incur a small loss in the positive selection rate

but tightly control the false discovery rate, a desirable property in many applications.

The extended Bayes information criteria are extremely useful for variable selection in

problems with a moderate sample size but a huge number of covariates, especially in

genome-wide association studies, which are now an active area in genetics research.

Some keywords: Bayesian paradigm; Consistency; Genome-wide association study; Tour-

nament approach; Variable selection.

1. Introduction

In many applications a variable of interest is inﬂuenced by a number of uniden-

tiﬁed covariates among a large collection of potential covariates, whose number is

much larger than the number of observations. For example, in genome-wide asso-

ciation studies, geneticists type tens or hundreds of thousands of single nucleotide

polymorphisms spreading over the whole genome to identify a handful of them that

are responsible for the genetic variation of a quantitative trait or a disease status;

see Marchini et al. (2005). In principle, the statistical issue involved is simply a

variable selection problem. However the sheer number of covariates P and the com-

paratively small sample size n make the variable-selection problem a great statistical

challenge. In such situations, classical criteria such as the Akaike information crite-

rion or aic (Akaike, 1973), the Bayes information criterion or bic (Schwarz, 1978),

and other methods such as cross validation and generalized cross validation (Stone,

1974; Craven & Wahba, 1979), are usually too liberal; that is, they tend to select a

model with many spurious covariates. This phenomenon has been observed by Bro-

man & Speed (2002), Siegmund (2004) and Bogdan et al. (2004), in their use of bic

for quantitative trait loci mapping, and will also be shown later in this article.

Variable selection with large model spaces has drawn increasing attention recently.

Meinshausen & B¨uhlmann (2006) and Zhao & Yu (2006) investigated consistency

properties, while Zhang & Huang (2008) studied the sparsity and bias properties of

the Lasso-based variable-selection methods (Tibshirani, 1996). To ensure consistency

of the Lasso-based variable-selection procedure, the tuning parameter must be set to

an appropriate asymptotic order, and the design matrix must satisfy a sparse Riesz

condition.

In this paper, we propose a class of extended Bayes information criteria to better

meet the needs of variable selection for large model spaces. The original bic is an ap-

proximate Bayes approach, see Berger & Pericchi (2001) and some details later in this

article. The simplicity and eﬀectiveness of the bic have made it very attractive, even

when the regularity conditions are not satisﬁed. More recently, in unpublished work,

J. O. Berger has developed a more rigorous Bayes approach called the generalized

Bayes information criterion, which sticks more to the Bayes paradigm and reﬁnes

the choice of prior distributions for various parametric models. However, Berger’s

criterion still deals mostly with the case where P is not large compared with n.

The extended Bayes information criterion family that we propose is particularly

suitable for model selection for large model spaces. It includes the original bic as a

special case and retains its simplicity. Under some mild conditions, these new criteria

are shown to be consistent. The result is particularly useful even when the covariates

are heavily collinear. Furthermore, unlike competitors such as that of Meinshausen &

B¨uhlmann (2006), the extended Bayes information criterion family does not require

a data adaptive tuning parameter procedure in order to be consistent, and hence is

easy to use in applications.

2. An extended family of Bayes information criteria

Let {(y

, x

) : i = 1, . . . , n} be independent observations. Suppose that the con-

ditional density function of y

given x

is f(y

, θ), where θ ∈ Θ ⊂ R

, P being a

positive integer. The likelihood function of θ is given by

(θ) = f(x; θ) =

i=1

f(y

, θ),

where Y = (y

, . . . , y

). Let s be a subset of {1, . . . , P }. Denote by θ(s) the parameter

θ with those components outside s being set to 0 or some prespeciﬁed values. The

bic proposed by Schwarz (1978) selects the model that minimizes

bic(s) = −2 log L

{

θ(s)} + ν(s) log n,

where

θ(s) is the maximum likelihood estimator of θ(s), and ν(s) is the number of

components in s. Let S be the model space under consideration and let p(s) be the

prior probability of model s. Assume that, given s, the prior density of θ(s) is given

by π{θ(s)}. The posterior probability of s is obtained as

p(s|Y ) =

m(Y |s)p(s)

s∈S

p(s)m(Y |s)

where m(Y |s) is the likelihood of model s, given by

m(Y |s) =

f{Y ; θ(s)}π{θ(s)}dθ(s).

Under the Bayes paradigm, a model s

∗

that maximizes the posterior probability is

selected. Since

s∈S

p(s)m(Y |s) is a constant, s

∗

= argmax

s∈S

m(Y |s)p(s). Under

some regularity conditions on f(Y ; θ) such as the requirement that s must contain all

the nonzero components of θ and have constant dimension, the maximum likelihood

estimator of θ(s) is root-n consistent, and −2 log{m(Y |s)} has a Laplace approxima-

tion given by bic(s) up to an additive constant. That is, the bic is an approximate

Bayes approach as mentioned in introduction. An implicit assumption underlying

bic is that p(s) is constant for s over S.

It is well known that bic is consistent (Rao & Wu, 1989) under some standard

conditions such as P is ﬁxed. In nonregular problems such as change-point analysis,

the root-n consistency of

θ(s) may be violated, yet bic is still consistent (Yao, 1988;

Cs¨org¨o & Horv´ath, 1997). Nevertheless, bic is not without drawbacks. The precision

of the Laplace approximation is inﬂuenced by the speciﬁc form of the prior density

on θ(s) and the correlation structure between observations. The latter aﬀects the

interpretation of the sample size n in the deﬁnition of bic(s). The recent unpublished

work of Berger, and Clyde et al. (2007) have concentrated on these issues. They have

focused on the marginal likelihood m(Y |s) and rectiﬁed the problems caused by the

Laplace approximation. However, they have not targeted the problems that could be

caused by large model spaces.

In a typical genome-wide association study with single nucleotide polymorphisms,

the number of covariates is of the order of tens or hundreds of thousands while the

sample size is only in hundreds. Suppose the number of covariates under consideration

is P = 1000. The class of models containing a single covariate, S

, has size 1000, while

the class of models containing two covariates, S

, has size 1000×999/2. The constant

prior behind bic amounts to assigning probabilities to the S

proportional to their

sizes. Thus the probability assigned to S

is 999/2 times that assigned to S

. The

size of S

increases as j increases to j = P/2 = 500, so that the probability assigned

to S

by the prior increases almost exponentially. Models with a larger number of

covariates, 50 or 100 say, receive much higher probabilities than models with fewer

covariates. This is obviously unreasonable, being strongly against the principle of

parsimony.

This re-examination of bic prompts us naturally to consider other reasonable

priors over the model space in the Bayes approach. Assume that the model space

S is partitioned into ∪

j=1

, such that models within each S

have equal dimension.

Let τ(S

) be the size of S

. For example, if S

is the collection of all models with j

covariates, τ(S

) =





. We assign the prior distribution over S as follows. For each

s in the same subspace S

, assign an equal probability, i.e., pr(s|S

) = 1/τ(S

) for

any s ∈ S

. This is reasonable since all the models in S

are equally plausible. Then,

instead of assigning probabilities pr(S

) proportional to τ(S

), as in the ordinary bic,

Extended Bayesian information criteria for model selection with large model spaces

Figures

Citations

Estimating Psychological Networks and their Accuracy : A tutorial paper

A Tutorial on Regularized Partial Correlation Networks

An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations.

Estimating Psychological Networks and their Accuracy: A Tutorial Paper

A tutorial on regularized partial correlation networks.

References

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Regression Shrinkage and Selection via the Lasso

Estimating the Dimension of a Model

Estimating the dimension of a model

Information Theory and an Extention of the Maximum Likelihood Principle

Related Papers (5)

Regression Shrinkage and Selection via the Lasso

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

Estimating the Dimension of a Model

The adaptive lasso and its oracle properties

Regularization and variable selection via the elastic net

Frequently Asked Questions (5)

Q1. What are the contributions mentioned in the paper "Extended bayesian information criteria for model selection with large model spaces" ?

Q2. What have the authors stated for future works in "Extended bayesian information criteria for model selection with large model spaces" ?

Q3. What is the definition of the extended Bayes information criteria?

Q4. What is the way to evaluate the Bayes information criteria?

Q5. Who is the professor of statistics at the University of British Columbia?