scispace - formally typeset
Open AccessJournal ArticleDOI

Order-Based Dependent Dirichlet Processes

Jim E. Griffin, +1 more
- 01 Mar 2006 - 
- Vol. 101, Iss: 473, pp 179-194
Reads0
Chats0
TLDR
This article allows the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation and derives the correlation between distributions at different covariate values.
Abstract
In this article we propose a new framework for Bayesian nonparametric modeling with continuous covariates. In particular, we allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation. We focus mostly on the class of random distributions that induces a Dirichlet process at each covariate value. We derive the correlation between distributions at different covariate values and use a point process to implement a practically useful type of ordering. Two main constructions with analytically known correlation structures are proposed. Practical and efficient computational methods are introduced. We apply our framework, through mixtures of these processes, to regression modeling, the modeling of stochastic volatility in time series data, and spatial geostatistical modeling.

read more

Content maybe subject to copyright    Report

Order-Based Dependent Dirichlet Processes
J.E. Griffin and M.F.J. Steel
Abstract
In this paper we propose a new framework for Bayesian nonparametric modelling with continuous
covariates. In particular, we allow the nonparametric distribution to depend on covariates through order-
ing the random variables building the weights in the stick-breaking representation. We focus mostly on
the class of random distributions which induces a Dirichlet process at each covariate value. We derive
the correlation between distributions at different covariate values, and use a point process to imple-
ment a practically useful type of ordering. Two main constructions with analytically known correlation
structures are proposed. Practical and efficient computational methods are introduced. We apply our
framework, though mixtures of these processes, to regression modelling, the modelling of stochastic
volatility in time series data and spatial geostatistical modelling.
Keywords: Bayesian nonparametrics, Markov chain Monte Carlo, Nonparametric Regression, Spatial
Modelling, Stick-breaking Prior, Volatility Modelling.
1 Introduction
Bayesian nonparametric method have become increasingly popular in empirical studies. The Dirichlet
process (Ferguson 1973) has been the dominant mechanism used as the prior for the unknown distribu-
tion in the model specification. Some recent examples include applications in econometrics (Chib and
Hamilton 2002; Hirano 2002), medicine (Kottas et al. 2002), health (O’Hagan and Stevens 2003), audit-
ing (Laws and O’Hagan 2002), animal breeding (van der Merwe and Pretorius 2003), survival analysis
Jim Griffin is Lecturer, Department of Statistics, University of Warwick, CV4 7AL, U.K. (Email:
J.E.Griffin@warwick.ac.uk) and Mark Steel is Professor, Department of Statistics, University of Warwick, Coventry, CV4 7AL,
U.K. (Email: M.F.Steel@stats.warwick.ac.uk). Both authors were affiliated with the Institute of Mathematics, Statistics and
Actuarial Science, University of Kent at Canterbury during the early part of this research. Jim Griffin acknowledges research
support from The Nuffield Foundation grant NUF-NAL/00728. We would like to thank Andy Hone for his contribution to some
calculations and we are grateful to Carmen Fern
´
andez and Steve MacEachern for helpful discussions and to two referees and the
Associate Editor for insightful comments.
1
CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

(Doss and Huffer 2003), directional data (Ghosh et al. 2003), meta analysis (Chung et al. 2002), genet-
ics (Medvedovic and Sivaganesan 2002) and density estimation (Hansen and Lauritzen 2002). However,
modelling the relationship between covariates and the unknown distribution cannot be achieved directly
using the Dirichlet process described by Ferguson.
Therefore, an active area of research is extending these methods to a wider class of models where the
unknown distribution depends on covariates. If the covariates have a finite number of levels the Product
of Dirichlet processes model introduced by Cifarelli and Regazzini (1978) allows the modelling of de-
pendent distributions. Dependence is introduced through the use of a parametric regression model as the
centring distribution of independent Dirichlet processes at each level of the covariates. These method
have recently been applied to problems in biostatistics (Carota and Parmigiani 2002), econometrics (Grif-
fin and Steel 2004) and survival analysis (Guidici et al. 2003) and a similar idea was proposed in Mallick
and Walker (1997). In the present paper we focus on introducing dependence on continuous covariates.
Other approaches to this problem exist in the literature. M
¨
uller and Rosner (1998) propose including the
covariates in the nonparametric distribution and focusing on the conditional given the covariates only.
Since this implies leaving out a factor in the likelihood, M
¨
uller et al. (2004) change the prior on the
process to counteract this fact. Finally, the method described by MacEachern et al. (2001) is closest
to the approach developed here, as both approaches start from the Sethuraman (1994) representation,
mentioned in the following subsection.
Here we introduce dependence in nonparametric distributions by making the weights in the Sethu-
raman representation dependent on the covariates. Each weight is a transformation of i.i.d. random
variables. The way we implement the dependence is by inducing an ordering π of these random vari-
ables at each covariate value such that distributions for similar covariates values will be associated with
similar orderings and, thus, be close. At any covariate value, the random distribution will be a so-
called stick-breaking prior. We focus on the special case where we choose the Dirichlet process for this
stick-breaking prior, and we shall call the induced class of processes Order-Based Dependent Dirichlet
Processes, shortened to πDDP’s.
We derive theoretical properties, such as the correlation between distributions at different covariate
values, and use a point process to implement a practically useful type of ordering. Two main construc-
tions with analytically known correlation structures are proposed. Practical computational methods are
introduced, using Markov chain Monte Carlo (MCMC) methods. We control the truncation error in an
intuitive fashion through truncation of the point process and we use sequential allocation as an efficient
way to avoid the sampler getting stuck in local modes. We apply our basic framework, though mix-
tures of πDDP’s, in three quite different settings. We use it for curve fitting, the modelling of stochastic
volatility in time series data and spatial geostatistical modelling.
Subsection 1.1 describes stick-breaking priors, while Section 2 introduces the ideas underlying
πDDP’s and their practical implementation. Section 3 briefly discusses mixtures of these processes,
and Section 4 concerns elicitation of the prior. Computational issues are dealt with in Section 5 and
Section 6 describes the three applications. The final section concludes.
Proofs will be grouped in Appendix A without explicit mention in the text.
2
CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

1.1 Stick-breaking priors
The idea of defining random distributions through stick-breaking construction is developed in Pitman (1996)
where its uses in several areas of application are reviewed. The class is discussed by Ishwaran and
James (2001) as a prior distribution in nonparametric problems. A random distribution, F , has a stick-
breaking prior if
F
d
=
N
X
i=1
p
i
δ
θ
i
, (1)
where δ
z
denotes a Dirac measure at z, p
i
= V
i
Q
j<i
(1 V
j
) where V
1
, . . . , V
N
are independent with
V
k
Beta(a
k
, b
k
) and θ
1
, . . . , θ
N
are independent draws from a distribution H. Conventionally, only
models with an infinite representation are referred to as nonparametric (see e.g. Bernardo and Smith,
1994, p.228). Ishwaran and James (2001) give the following condition to determine if the distribution is
well-defined for N =
X
k=1
p
k
= 1 a.s.
X
k=1
E(log(1 V
k
)) = −∞.
For finite N the condition
P
N
k=1
p
k
= 1 is satisfied if V
N
= 1 so that p
N
=
Q
j<N
(1V
j
). For N =
several interesting processes fall into this class:
1. The Dirichlet process prior (Ferguson 1973) characterised by M H, where M is a positive scalar,
arises when V
i
follows a Beta(1, M ) for all i. This was established by Sethuraman (1994).
2. The Pitman-Yor process occurs if V
i
follows a Beta(1 a, b + ai) with 0 a < 1 and b > a.
As special cases we can identify the Dirichlet process for a = 0 and the stable law when b = 0.
This representation will provide the basis for our development of dependent probability measures and,
in particular, the development of a dependent Dirichlet process. We will refer to the θ
i
s as locations and
the V
i
s as masses.
2 Dependent Dirichlet Processes
2.1 General construction
A dependent Dirichlet process is a stochastic process defined on the space of probability measures over
a domain, indexed by time, space or a selection of other covariates in such a way that the marginal
distribution at any point in the domain follows a Dirichlet process. This problem has received little
attention in the Bayesian literature. Some recent work follows MacEachern (1999). The latter paper
considers the possibility of allowing the masses, V, or the locations, θ, of the atoms to follow a stochastic
process defined over the domain. An important constraint imposed by the definition of the Dirichlet
process is that the processes for each element of either θ or V must be independent. The work of
MacEachern and coauthors concentrates on the “single-p model where only the locations, θ, follow
stochastic processes. An application to spatial modelling is further developed in Gelfand et al. (2004)
3
CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

by allowing the locations θ to be drawn from a random field (a Gaussian process). The same method to
induce dependence is used in De Iorio et al. (2004) to achieve an analysis of variance (ANOVA)-type
structure.
In general, such approaches which allow only values of θ to depend on the covariates are subject to
certain problems. In particular, MacEachern notes that the distribution of F can then be expressed as
a mixture of Dirichlet processes. The posterior process will have an updated mass parameter M + n,
where n is the sample size, at all values of the index. This latter fact is counterintuitive, in our view. A
useful property would rather be that the process returns to the prior distribution (with mass parameter M)
at points in the domain “far” from the observed data. This seems a major shortcoming of these single-p
models for general spaces.
In contrast to the models described above, the processes developed in this paper allow the values of
the weights p
i
in (1) to change over the domain of the covariates. For ease of presentation it will be
assumed that each location, θ
i
, does not depend on the covariates. However, the ideas that are developed
could be extended to allow for the introduction of dependence through the locations (i.e. drawn from
independent stochastic processes). MacEachern (2000) has some useful results in this direction.
Definition 1 An Order-based Dependent Stick-Breaking Prior is defined on a space D by a sequence
{a
k
, b
k
}, centring distribution H and a stochastic process {π(x)}
xD
for which:
1. {π
1
(x), . . . , π
n(x)
(x)} {1, . . . , N } for some n(x) N.
2. π
i
(x) = π
j
(x) if and only if i = j.
Random variables θ
1
, . . . , θ
N
and V
1
, . . . , V
N1
are all independent, θ
k
H and V
k
Beta(a
k
, b
k
).
The distribution at a point x D is defined by
F
x
d
=
n(x)
X
i=1
p
i
(x)δ
θ
π
i
(x)
p
i
(x) = V
π
i
(x)
Y
j<i
(1 V
π
j
(x)
),
and for finite n(x)
p
n(x)
(x) =
Y
j<n(x)
(1 V
π
j
(x)
).
We will refer to π(x) = (π
1
(x), . . . , π
n(x)
(x)) as the ordering at x.
The stick-breaking prior in Subsection 1.1 is recovered for any given x D. We obtain the same
distribution over the entire space D if π
i
(x) = i for all x D and i = 1, . . . , N. As a more interesting
example, the stochastic process π(x) could be defined on the space of permutations of {1, . . . , N } (i.e.
n(x) = N for all x D), allowing F
x
to change with x. However, the definition allows the stochastic
process to be defined on more general structures. In particular, some elements of the ordering at a
given point need not appear in the ordering at another point. An example of such a process is given in
Subsection 2.2.2. In general, this defines a wide class of dependent distributions, both parametric (finite
4
CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

N) and nonparametric (infinite N). Usually, we will be interested in N = so that F
x
can follow a
Dirichlet process. However, it is not easy to define stochastic processes π(x) for infinite N . Therefore,
we focus our attention on specific constructions for the stochastic process in this case.
The prior distribution for F
x
inherits some properties of stick-breaking priors. For example, the first
moment measure is
E[F
x
(B)] = E
n(x)
X
i=1
p
i
(x)δ
θ
π
i
(x)
(B)
= E
n(x)
X
i=1
p
i
(x)
E
h
δ
θ
π
i
(x)
(B)
i
= H(B),
and
Var[F
x
(B)|π(x)] = H(B)(1 H(B))
×
n(x)
X
i=1
a
π
i
(x)
¡
a
π
i
(x)
+ 1
¢
¡
a
π
i
(x)
+ b
π
i
(x)
¢¡
a
π
i
(x)
+ b
π
i
(x)
+ 1
¢
Y
j<i
b
π
i
(x)
¡
b
π
i
(x)
+ 1
¢
¡
a
π
i
(x)
+ b
π
i
(x)
¢¡
a
π
i
(x)
+ b
π
i
(x)
+ 1
¢
.
In the sequel we will assume that a
k
= 1, b
k
= M and that N = so that we recover a Dirichlet
process at any x D if n(x) = . For the marginal variance, we then obtain
Var [F
x
(B)] = E
π(x)
[Var [F
x
(B)|π(x)]] =
H(B)(1 H(B))
M + 1
. (2)
The associated subclass of processes will be denoted by Order-Based Dependent Dirichlet Processes,
abbreviated as πDDP and characterised by a mass parameter M, centring distribution H and a stochastic
process {π(x)}
xD
.
The construction in Definition 1 is motivated by the fact that for our stick-breaking prior E[p
i
(x)] <
E[p
i1
(x)] for any x and thus the influence of an atom diminishes as it gets further down the ranking
(i.e. its order in π(x) increases). This allows us to easily impose the characteristic of “localness” which
can be described as follows. An important improvement over the single-p DDP models is the flexibility
to allow the posterior at an index x
?
to tend to the prior as the distance between x
?
and observed indices
tends to infinity if n(x) = for all x. If we observe y
1
, . . . , y
n
at indices x
1
, . . . , x
n
, posterior updating
can be seen as linking the observations to atoms of the distribution at each index by a new variable s
i
for which θ
π
s
i
(x
i
)
= y
i
and P (s
i
= j) = p
j
(x
i
). Thus, s
i
is the ranking of location y
i
at index x
i
.
Conditioning on s
1
, . . . , s
n
and π, there will be a subset of {1, . . . , n}, which we call J, for which
π
s
?
j
(x
?
) = π
s
j
(x
j
), where the variables s
?
j
, j = 1, . . . , n denote the position of location y
j
in the
ordering at index x
?
. This set J groups the observed locations which are in the ordering both at x
j
and
at x
?
. Then
F
x
?
d
=
min
½
π
s
?
j
(x
?
)
¯
¯
¯
¯
j∈J
¾
X
i=1
p
i
(x
?
)δ
θ
π
i
(x
?
)
+
X
i=min
½
π
s
?
j
(x
?
)
¯
¯
¯
¯
j∈J
¾
+1
p
i
(x
?
)δ
θ
π
i
(x
?
)
.
The only updated elements of θ and V in the conditional posterior will be those indexed by the elements
of {π
s
?
j
(x
?
)|j J}. The first part of the sum involves random variable which have not been updated and
5
CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

Figures
Citations
More filters
Journal ArticleDOI

A Tutorial on Bayesian Nonparametric Models

TL;DR: This tutorial is a high-level introduction to Bayesian nonparametric methods and contains several examples of their application.
Book

Fundamentals of Nonparametric Bayesian Inference

TL;DR: This authoritative text draws on theoretical advances of the past twenty years to synthesize all aspects of Bayesian nonparametrics, from prior construction to computation and large sample behavior of posteriors, making it valuable for both graduate students and researchers in statistics and machine learning.
Journal ArticleDOI

Geostatistical inference under preferential sampling

TL;DR: In this article, the authors present a model for preferential sampling and demonstrate through simulated examples that ignoring preferential sampling can lead to misleading inferences, and they describe an application of the model to a set of biomonitoring data from Galicia, northern Spain.
Journal ArticleDOI

The Nested Dirichlet Process

TL;DR: In this article, the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered is addressed, and an efficient Markov chain Monte Carlo algorithm is developed for computation.
Journal ArticleDOI

Distance Dependent Chinese Restaurant Processes

TL;DR: The distance dependent Chinese restaurant process (DCP) as discussed by the authors is a flexible class of distributions over partitions that allows for dependencies between the elements, which can be used to model many kinds of dependencies between data in infinite clustering models including dependencies arising from time, space and network connectivity.
References
More filters
Book

Continuous univariate distributions

TL;DR: Continuous Distributions (General) Normal Distributions Lognormal Distributions Inverse Gaussian (Wald) Distributions Cauchy Distribution Gamma Distributions Chi-Square Distributions Including Chi and Rayleigh Exponential Distributions Pareto Distributions Weibull Distributions Abbreviations Indexes
Journal ArticleDOI

Reversible jump Markov chain Monte Carlo computation and Bayesian model determination

Peter H.R. Green
- 01 Dec 1995 - 
TL;DR: In this article, the authors propose a new framework for the construction of reversible Markov chain samplers that jump between parameter subspaces of differing dimensionality, which is flexible and entirely constructive.
Journal ArticleDOI

A Bayesian Analysis of Some Nonparametric Problems

TL;DR: In this article, a class of prior distributions, called Dirichlet process priors, is proposed for nonparametric problems, for which treatment of many non-parametric statistical problems may be carried out, yielding results that are comparable to the classical theory.
Book

Stochastic Geometry and Its Applications

TL;DR: Random Closed Sets I--The Boolean Model. Random Closed Sets II--The General Case.
Journal ArticleDOI

Bayesian Density Estimation and Inference Using Mixtures

TL;DR: In this article, the authors describe and illustrate Bayesian inference in models for density estimation using mixtures of Dirichlet processes and show convergence results for a general class of normal mixture models.
Frequently Asked Questions (1)
Q1. What are the contributions in "Order-based dependent dirichlet processes" ?

In this paper the authors propose a new framework for Bayesian nonparametric modelling with continuous covariates. In particular, the authors allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation.