What are the contributions in "Order-based dependent dirichlet processes" ?

In this paper the authors propose a new framework for Bayesian nonparametric modelling with continuous covariates. In particular, the authors allow the nonparametric distribution to depend on covariates through ordering the random variables building the weights in the stick-breaking representation.

(Open Access) Order-Based Dependent Dirichlet Processes (2006) | Jim E. Griffin

Order-Based Dependent Dirichlet Processes

J.E. Grifﬁn and M.F.J. Steel

∗

Abstract

In this paper we propose a new framework for Bayesian nonparametric modelling with continuous

covariates. In particular, we allow the nonparametric distribution to depend on covariates through order-

ing the random variables building the weights in the stick-breaking representation. We focus mostly on

the class of random distributions which induces a Dirichlet process at each covariate value. We derive

the correlation between distributions at different covariate values, and use a point process to imple-

ment a practically useful type of ordering. Two main constructions with analytically known correlation

structures are proposed. Practical and efﬁcient computational methods are introduced. We apply our

framework, though mixtures of these processes, to regression modelling, the modelling of stochastic

volatility in time series data and spatial geostatistical modelling.

Keywords: Bayesian nonparametrics, Markov chain Monte Carlo, Nonparametric Regression, Spatial

Modelling, Stick-breaking Prior, Volatility Modelling.

1 Introduction

Bayesian nonparametric method have become increasingly popular in empirical studies. The Dirichlet

process (Ferguson 1973) has been the dominant mechanism used as the prior for the unknown distribu-

tion in the model speciﬁcation. Some recent examples include applications in econometrics (Chib and

Hamilton 2002; Hirano 2002), medicine (Kottas et al. 2002), health (O’Hagan and Stevens 2003), audit-

ing (Laws and O’Hagan 2002), animal breeding (van der Merwe and Pretorius 2003), survival analysis

∗

Jim Grifﬁn is Lecturer, Department of Statistics, University of Warwick, CV4 7AL, U.K. (Email:

J.E.Grifﬁn@warwick.ac.uk) and Mark Steel is Professor, Department of Statistics, University of Warwick, Coventry, CV4 7AL,

U.K. (Email: M.F.Steel@stats.warwick.ac.uk). Both authors were afﬁliated with the Institute of Mathematics, Statistics and

Actuarial Science, University of Kent at Canterbury during the early part of this research. Jim Grifﬁn acknowledges research

support from The Nufﬁeld Foundation grant NUF-NAL/00728. We would like to thank Andy Hone for his contribution to some

calculations and we are grateful to Carmen Fern

andez and Steve MacEachern for helpful discussions and to two referees and the

Associate Editor for insightful comments.

CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

(Doss and Huffer 2003), directional data (Ghosh et al. 2003), meta analysis (Chung et al. 2002), genet-

ics (Medvedovic and Sivaganesan 2002) and density estimation (Hansen and Lauritzen 2002). However,

modelling the relationship between covariates and the unknown distribution cannot be achieved directly

using the Dirichlet process described by Ferguson.

Therefore, an active area of research is extending these methods to a wider class of models where the

unknown distribution depends on covariates. If the covariates have a ﬁnite number of levels the Product

of Dirichlet processes model introduced by Cifarelli and Regazzini (1978) allows the modelling of de-

pendent distributions. Dependence is introduced through the use of a parametric regression model as the

centring distribution of independent Dirichlet processes at each level of the covariates. These method

have recently been applied to problems in biostatistics (Carota and Parmigiani 2002), econometrics (Grif-

ﬁn and Steel 2004) and survival analysis (Guidici et al. 2003) and a similar idea was proposed in Mallick

and Walker (1997). In the present paper we focus on introducing dependence on continuous covariates.

Other approaches to this problem exist in the literature. M

uller and Rosner (1998) propose including the

covariates in the nonparametric distribution and focusing on the conditional given the covariates only.

Since this implies leaving out a factor in the likelihood, M

uller et al. (2004) change the prior on the

process to counteract this fact. Finally, the method described by MacEachern et al. (2001) is closest

to the approach developed here, as both approaches start from the Sethuraman (1994) representation,

mentioned in the following subsection.

Here we introduce dependence in nonparametric distributions by making the weights in the Sethu-

raman representation dependent on the covariates. Each weight is a transformation of i.i.d. random

variables. The way we implement the dependence is by inducing an ordering π of these random vari-

ables at each covariate value such that distributions for similar covariates values will be associated with

similar orderings and, thus, be close. At any covariate value, the random distribution will be a so-

called stick-breaking prior. We focus on the special case where we choose the Dirichlet process for this

stick-breaking prior, and we shall call the induced class of processes Order-Based Dependent Dirichlet

Processes, shortened to πDDP’s.

We derive theoretical properties, such as the correlation between distributions at different covariate

values, and use a point process to implement a practically useful type of ordering. Two main construc-

tions with analytically known correlation structures are proposed. Practical computational methods are

introduced, using Markov chain Monte Carlo (MCMC) methods. We control the truncation error in an

intuitive fashion through truncation of the point process and we use sequential allocation as an efﬁcient

way to avoid the sampler getting stuck in local modes. We apply our basic framework, though mix-

tures of πDDP’s, in three quite different settings. We use it for curve ﬁtting, the modelling of stochastic

volatility in time series data and spatial geostatistical modelling.

Subsection 1.1 describes stick-breaking priors, while Section 2 introduces the ideas underlying

πDDP’s and their practical implementation. Section 3 brieﬂy discusses mixtures of these processes,

and Section 4 concerns elicitation of the prior. Computational issues are dealt with in Section 5 and

Section 6 describes the three applications. The ﬁnal section concludes.

Proofs will be grouped in Appendix A without explicit mention in the text.

CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

1.1 Stick-breaking priors

The idea of deﬁning random distributions through stick-breaking construction is developed in Pitman (1996)

where its uses in several areas of application are reviewed. The class is discussed by Ishwaran and

James (2001) as a prior distribution in nonparametric problems. A random distribution, F , has a stick-

breaking prior if

i=1

, (1)

where δ

denotes a Dirac measure at z, p

= V

j<i

(1 − V

) where V

, . . . , V

are independent with

∼ Beta(a

, b

) and θ

, . . . , θ

are independent draws from a distribution H. Conventionally, only

models with an inﬁnite representation are referred to as nonparametric (see e.g. Bernardo and Smith,

1994, p.228). Ishwaran and James (2001) give the following condition to determine if the distribution is

well-deﬁned for N = ∞

∞

k=1

= 1 a.s. ⇐⇒

∞

k=1

E(log(1 − V

)) = −∞.

For ﬁnite N the condition

k=1

= 1 is satisﬁed if V

= 1 so that p

j<N

(1−V

). For N = ∞

several interesting processes fall into this class:

1. The Dirichlet process prior (Ferguson 1973) characterised by M H, where M is a positive scalar,

arises when V

follows a Beta(1, M ) for all i. This was established by Sethuraman (1994).

2. The Pitman-Yor process occurs if V

follows a Beta(1 − a, b + ai) with 0 ≤ a < 1 and b > −a.

As special cases we can identify the Dirichlet process for a = 0 and the stable law when b = 0.

This representation will provide the basis for our development of dependent probability measures and,

in particular, the development of a dependent Dirichlet process. We will refer to the θ

’s as locations and

the V

’s as masses.

2 Dependent Dirichlet Processes

2.1 General construction

A dependent Dirichlet process is a stochastic process deﬁned on the space of probability measures over

a domain, indexed by time, space or a selection of other covariates in such a way that the marginal

distribution at any point in the domain follows a Dirichlet process. This problem has received little

attention in the Bayesian literature. Some recent work follows MacEachern (1999). The latter paper

considers the possibility of allowing the masses, V, or the locations, θ, of the atoms to follow a stochastic

process deﬁned over the domain. An important constraint imposed by the deﬁnition of the Dirichlet

process is that the processes for each element of either θ or V must be independent. The work of

MacEachern and coauthors concentrates on the “single-p” model where only the locations, θ, follow

stochastic processes. An application to spatial modelling is further developed in Gelfand et al. (2004)

CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

by allowing the locations θ to be drawn from a random ﬁeld (a Gaussian process). The same method to

induce dependence is used in De Iorio et al. (2004) to achieve an analysis of variance (ANOVA)-type

structure.

In general, such approaches which allow only values of θ to depend on the covariates are subject to

certain problems. In particular, MacEachern notes that the distribution of F can then be expressed as

a mixture of Dirichlet processes. The posterior process will have an updated mass parameter M + n,

where n is the sample size, at all values of the index. This latter fact is counterintuitive, in our view. A

useful property would rather be that the process returns to the prior distribution (with mass parameter M)

at points in the domain “far” from the observed data. This seems a major shortcoming of these single-p

models for general spaces.

In contrast to the models described above, the processes developed in this paper allow the values of

the weights p

in (1) to change over the domain of the covariates. For ease of presentation it will be

assumed that each location, θ

, does not depend on the covariates. However, the ideas that are developed

could be extended to allow for the introduction of dependence through the locations (i.e. drawn from

independent stochastic processes). MacEachern (2000) has some useful results in this direction.

Deﬁnition 1 An Order-based Dependent Stick-Breaking Prior is deﬁned on a space D by a sequence

, b

}, centring distribution H and a stochastic process {π(x)}

x∈D

for which:

1. {π

(x), . . . , π

n(x)

(x)} ⊆ {1, . . . , N } for some n(x) ≤ N.

2. π

(x) = π

(x) if and only if i = j.

Random variables θ

, . . . , θ

and V

, . . . , V

N−1

are all independent, θ

∼H and V

∼Beta(a

, b

The distribution at a point x ∈ D is deﬁned by

n(x)

i=1

(x)δ

(x)

(x) = V

(x)

j<i

(1 − V

(x)

and for ﬁnite n(x)

n(x)

(x) =

j<n(x)

(1 − V

(x)

We will refer to π(x) = (π

(x), . . . , π

n(x)

(x)) as the ordering at x.

The stick-breaking prior in Subsection 1.1 is recovered for any given x ∈ D. We obtain the same

distribution over the entire space D if π

(x) = i for all x ∈ D and i = 1, . . . , N. As a more interesting

example, the stochastic process π(x) could be deﬁned on the space of permutations of {1, . . . , N } (i.e.

n(x) = N for all x ∈ D), allowing F

to change with x. However, the deﬁnition allows the stochastic

process to be deﬁned on more general structures. In particular, some elements of the ordering at a

given point need not appear in the ordering at another point. An example of such a process is given in

Subsection 2.2.2. In general, this deﬁnes a wide class of dependent distributions, both parametric (ﬁnite

CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

N) and nonparametric (inﬁnite N). Usually, we will be interested in N = ∞ so that F

can follow a

Dirichlet process. However, it is not easy to deﬁne stochastic processes π(x) for inﬁnite N . Therefore,

we focus our attention on speciﬁc constructions for the stochastic process in this case.

The prior distribution for F

inherits some properties of stick-breaking priors. For example, the ﬁrst

moment measure is

E[F

(B)] = E





n(x)

i=1

(x)δ

(x)

(B)





= E





n(x)

i=1

(x)





(x)

(B)

= H(B),

and

Var[F

(B)|π(x)] = H(B)(1 − H(B))

n(x)

i=1

(x)

+ 1

(x)

+ b

(x)

¢¡

(x)

+ b

(x)

+ 1

j<i

(x)

+ 1

(x)

+ b

(x)

¢¡

(x)

+ b

(x)

+ 1

In the sequel we will assume that a

= 1, b

= M and that N = ∞ so that we recover a Dirichlet

process at any x ∈ D if n(x) = ∞. For the marginal variance, we then obtain

Var [F

(B)] = E

π(x)

[Var [F

(B)|π(x)]] =

H(B)(1 − H(B))

M + 1

. (2)

The associated subclass of processes will be denoted by Order-Based Dependent Dirichlet Processes,

abbreviated as πDDP and characterised by a mass parameter M, centring distribution H and a stochastic

process {π(x)}

x∈D

The construction in Deﬁnition 1 is motivated by the fact that for our stick-breaking prior E[p

(x)] <

E[p

i−1

(x)] for any x and thus the inﬂuence of an atom diminishes as it gets further down the ranking

(i.e. its order in π(x) increases). This allows us to easily impose the characteristic of “localness” which

can be described as follows. An important improvement over the single-p DDP models is the ﬂexibility

to allow the posterior at an index x

to tend to the prior as the distance between x

and observed indices

tends to inﬁnity if n(x) = ∞for all x. If we observe y

, . . . , y

at indices x

, . . . , x

, posterior updating

can be seen as linking the observations to atoms of the distribution at each index by a new variable s

for which θ

)

= y

and P (s

= j) = p

). Thus, s

is the ranking of location y

at index x

Conditioning on s

, . . . , s

and π, there will be a subset of {1, . . . , n}, which we call J, for which

) = π

), where the variables s

, j = 1, . . . , n denote the position of location y

in the

ordering at index x

. This set J groups the observed locations which are in the ordering both at x

and

at x

. Then

min

)

j∈J

i=1

)δ

)

∞

i=min

)

j∈J

)δ

)

The only updated elements of θ and V in the conditional posterior will be those indexed by the elements

of {π

)|j ∈ J}. The ﬁrst part of the sum involves random variable which have not been updated and

CRiSM Paper No. 05-9, www.warwick.ac.uk/go/crism

Order-Based Dependent Dirichlet Processes

Figures

Citations

A Tutorial on Bayesian Nonparametric Models

Fundamentals of Nonparametric Bayesian Inference

Geostatistical inference under preferential sampling

The Nested Dirichlet Process

Distance Dependent Chinese Restaurant Processes

References

Continuous univariate distributions

Reversible jump Markov chain Monte Carlo computation and Bayesian model determination

A Bayesian Analysis of Some Nonparametric Problems

Stochastic Geometry and Its Applications

Bayesian Density Estimation and Inference Using Mixtures

Related Papers (5)

A Bayesian Analysis of Some Nonparametric Problems

Gibbs sampling methods for stick-breaking priors

A constructive definition of dirichlet priors

Bayesian Density Estimation and Inference Using Mixtures

Hierarchical Dirichlet Processes

Frequently Asked Questions (1)

Q1. What are the contributions in "Order-based dependent dirichlet processes" ?