scispace - formally typeset
Open AccessJournal ArticleDOI

Extended Bayesian information criteria for model selection with large model spaces

TLDR
This paper re-examine the Bayesian paradigm for model selection and proposes an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space.
Abstract
SUMMARY The ordinary Bayesian information criterion is too liberal for model selection when the model space is large. In this paper, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to infinity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayesian information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayesian information criteria are extremely useful for variable selection in problems with a moderate sample size but with a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research.

read more

Content maybe subject to copyright    Report

Extended Bayesian Information Criteria for Model
Selection with Large Model Spaces
By JIAHUA CHEN
Department of Statistics, University of British Columbia, Vancouver,
British Columbia, V6T 1Z2 Canada
jhchen@stat.ubc.ca
and ZEHUA CHEN
Department of Statistics and Applied Probability, National University of Singapore,
Singapore 117546
stachenz@nus.edu.sg
Summary
The ordinary Bayes information criterion is too liberal for model selection when
the model space is large. In this article, we re-examine the Bayesian paradigm for
model selection and propose an extended family of Bayes information criteria. The
new criteria take into account both the number of unknown parameters and the com-
plexity of the model space. Their consistency is established, in particular allowing the
number of covariates to increase to infinity with the sample size. Their performance
in various situations is evaluated by simulation studies. It is demonstrated that the
extended Bayes information criteria incur a small loss in the positive selection rate
but tightly control the false discovery rate, a desirable property in many applications.
The extended Bayes information criteria are extremely useful for variable selection in
problems with a moderate sample size but a huge number of covariates, especially in
genome-wide association studies, which are now an active area in genetics research.
Some keywords: Bayesian paradigm; Consistency; Genome-wide association study; Tour-
nament approach; Variable selection.
1

1. Introduction
In many applications a variable of interest is influenced by a number of uniden-
tified covariates among a large collection of potential covariates, whose number is
much larger than the number of observations. For example, in genome-wide asso-
ciation studies, geneticists type tens or hundreds of thousands of single nucleotide
polymorphisms spreading over the whole genome to identify a handful of them that
are responsible for the genetic variation of a quantitative trait or a disease status;
see Marchini et al. (2005). In principle, the statistical issue involved is simply a
variable selection problem. However the sheer number of covariates P and the com-
paratively small sample size n make the variable-selection problem a great statistical
challenge. In such situations, classical criteria such as the Akaike information crite-
rion or aic (Akaike, 1973), the Bayes information criterion or bic (Schwarz, 1978),
and other methods such as cross validation and generalized cross validation (Stone,
1974; Craven & Wahba, 1979), are usually too liberal; that is, they tend to select a
model with many spurious covariates. This phenomenon has been observed by Bro-
man & Speed (2002), Siegmund (2004) and Bogdan et al. (2004), in their use of bic
for quantitative trait loci mapping, and will also be shown later in this article.
Variable selection with large model spaces has drawn increasing attention recently.
Meinshausen & B¨uhlmann (2006) and Zhao & Yu (2006) investigated consistency
properties, while Zhang & Huang (2008) studied the sparsity and bias properties of
the Lasso-based variable-selection methods (Tibshirani, 1996). To ensure consistency
of the Lasso-based variable-selection procedure, the tuning parameter must be set to
an appropriate asymptotic order, and the design matrix must satisfy a sparse Riesz
condition.
In this paper, we propose a class of extended Bayes information criteria to better
2

meet the needs of variable selection for large model spaces. The original bic is an ap-
proximate Bayes approach, see Berger & Pericchi (2001) and some details later in this
article. The simplicity and effectiveness of the bic have made it very attractive, even
when the regularity conditions are not satisfied. More recently, in unpublished work,
J. O. Berger has developed a more rigorous Bayes approach called the generalized
Bayes information criterion, which sticks more to the Bayes paradigm and refines
the choice of prior distributions for various parametric models. However, Berger’s
criterion still deals mostly with the case where P is not large compared with n.
The extended Bayes information criterion family that we propose is particularly
suitable for model selection for large model spaces. It includes the original bic as a
special case and retains its simplicity. Under some mild conditions, these new criteria
are shown to be consistent. The result is particularly useful even when the covariates
are heavily collinear. Furthermore, unlike competitors such as that of Meinshausen &
B¨uhlmann (2006), the extended Bayes information criterion family does not require
a data adaptive tuning parameter procedure in order to be consistent, and hence is
easy to use in applications.
2. An extended family of Bayes information criteria
Let {(y
i
, x
i
) : i = 1, . . . , n} be independent observations. Suppose that the con-
ditional density function of y
i
given x
i
is f(y
i
|x
i
, θ), where θ Θ R
P
, P being a
positive integer. The likelihood function of θ is given by
L
n
(θ) = f(x; θ) =
n
Y
i=1
f(y
i
|x
i
, θ),
where Y = (y
1
, . . . , y
n
). Let s be a subset of {1, . . . , P }. Denote by θ(s) the parameter
θ with those components outside s being set to 0 or some prespecified values. The
3

bic proposed by Schwarz (1978) selects the model that minimizes
bic(s) = 2 log L
n
{
ˆ
θ(s)} + ν(s) log n,
where
ˆ
θ(s) is the maximum likelihood estimator of θ(s), and ν(s) is the number of
components in s. Let S be the model space under consideration and let p(s) be the
prior probability of model s. Assume that, given s, the prior density of θ(s) is given
by π{θ(s)}. The posterior probability of s is obtained as
p(s|Y ) =
m(Y |s)p(s)
P
s∈S
p(s)m(Y |s)
,
where m(Y |s) is the likelihood of model s, given by
m(Y |s) =
Z
f{Y ; θ(s)}π{θ(s)}(s).
Under the Bayes paradigm, a model s
that maximizes the posterior probability is
selected. Since
P
s∈S
p(s)m(Y |s) is a constant, s
= argmax
s∈S
m(Y |s)p(s). Under
some regularity conditions on f(Y ; θ) such as the requirement that s must contain all
the nonzero components of θ and have constant dimension, the maximum likelihood
estimator of θ(s) is root-n consistent, and 2 log{m(Y |s)} has a Laplace approxima-
tion given by bic(s) up to an additive constant. That is, the bic is an approximate
Bayes approach as mentioned in introduction. An implicit assumption underlying
bic is that p(s) is constant for s over S.
It is well known that bic is consistent (Rao & Wu, 1989) under some standard
conditions such as P is fixed. In nonregular problems such as change-point analysis,
the root-n consistency of
ˆ
θ(s) may be violated, yet bic is still consistent (Yao, 1988;
Cs¨org¨o & Horv´ath, 1997). Nevertheless, bic is not without drawbacks. The precision
of the Laplace approximation is influenced by the specific form of the prior density
on θ(s) and the correlation structure between observations. The latter affects the
4

interpretation of the sample size n in the definition of bic(s). The recent unpublished
work of Berger, and Clyde et al. (2007) have concentrated on these issues. They have
focused on the marginal likelihood m(Y |s) and rectified the problems caused by the
Laplace approximation. However, they have not targeted the problems that could be
caused by large model spaces.
In a typical genome-wide association study with single nucleotide polymorphisms,
the number of covariates is of the order of tens or hundreds of thousands while the
sample size is only in hundreds. Suppose the number of covariates under consideration
is P = 1000. The class of models containing a single covariate, S
1
, has size 1000, while
the class of models containing two covariates, S
2
, has size 1000×999/2. The constant
prior behind bic amounts to assigning probabilities to the S
j
proportional to their
sizes. Thus the probability assigned to S
2
is 999/2 times that assigned to S
1
. The
size of S
j
increases as j increases to j = P/2 = 500, so that the probability assigned
to S
j
by the prior increases almost exponentially. Models with a larger number of
covariates, 50 or 100 say, receive much higher probabilities than models with fewer
covariates. This is obviously unreasonable, being strongly against the principle of
parsimony.
This re-examination of bic prompts us naturally to consider other reasonable
priors over the model space in the Bayes approach. Assume that the model space
S is partitioned into
P
j=1
S
j
, such that models within each S
j
have equal dimension.
Let τ(S
j
) be the size of S
j
. For example, if S
j
is the collection of all models with j
covariates, τ(S
j
) =
P
j
. We assign the prior distribution over S as follows. For each
s in the same subspace S
j
, assign an equal probability, i.e., pr(s|S
j
) = 1(S
j
) for
any s S
j
. This is reasonable since all the models in S
j
are equally plausible. Then,
instead of assigning probabilities pr(S
j
) proportional to τ(S
j
), as in the ordinary bic,
5

Citations
More filters
Journal ArticleDOI

Estimating Psychological Networks and their Accuracy : A tutorial paper

TL;DR: In this article, the authors introduce the current state-of-the-art of network estimation and propose two novel statistical methods: the correlation stability coefficient and the bootstrapped difference test for edge-weights and centrality indices.
Journal ArticleDOI

A Tutorial on Regularized Partial Correlation Networks

TL;DR: In this article, the authors describe how regularization techniques can be used to efficiently estimate a parsimonious and interpretable network structure in psychological data, and demonstrate the method in an empirical example on post-traumatic stress disorder data.
Journal ArticleDOI

An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations.

TL;DR: Simulations suggest that the proposed multi-locus mixed model as a general method for mapping complex traits in structured populations outperforms existing methods in terms of power as well as false discovery rate.
Posted Content

Estimating Psychological Networks and their Accuracy: A Tutorial Paper

TL;DR: The current state-of-the-art of network estimation is introduced and a rationale why researchers should investigate the accuracy of psychological networks is provided, and the free R-package bootnet is developed that allows for estimating psychological networks in a generalized framework in addition to the proposed bootstrap methods.
Journal ArticleDOI

A tutorial on regularized partial correlation networks.

TL;DR: This tutorial introduces the reader to estimating the most popular network model for psychological data: the partial correlation network and describes how regularization techniques can be used to efficiently estimate a parsimonious and interpretable network structure in psychological data.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI

Estimating the Dimension of a Model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

Estimating the dimension of a model

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Proceedings Article

Information Theory and an Extention of the Maximum Likelihood Principle

H. Akaike
TL;DR: The classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion to provide answers to many practical problems of statistical model fitting.
Frequently Asked Questions (5)
Q1. What are the contributions mentioned in the paper "Extended bayesian information criteria for model selection with large model spaces" ?

In this article, the authors re-examine the Bayesian paradigm for model selection and propose an extended family of Bayes information criteria. 

Development of such a theory will be a topic of future research. 

The extended Bayes information criteria are extremely useful for variable selection in problems with a moderate sample size but a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research. 

It is demonstrated that the extended Bayes information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. 

By JIAHUA CHENDepartment of Statistics, University of British Columbia, Vancouver,British Columbia, V6T 1Z2 Canadajhchen@stat.ubc.caand ZEHUA CHENDepartment of Statistics and Applied Probability, National University of Singapore,Singapore 117546stachenz@nus.edu.sg