scispace - formally typeset
Search or ask a question

Showing papers in "Statistics and Computing in 1998"


Journal Article
TL;DR: This report presents an overview of sequential simulationbased methods for Bayesian filtering of nonlinear and non-Gaussian dynamic models and proposes some original developments.
Abstract: In this report, we present an overview of sequential simulationbased methods for Bayesian filtering of nonlinear and non-Gaussian dynamic models. It includes in a general framework numerous methods proposed independently in various areas of science and proposes some original developments.

747 citations


Journal ArticleDOI
TL;DR: This paper aims to summarise the current “state-of-play” for convergence assessment techniques and to motivate directions for future research in this area.
Abstract: MCMC methods have effectively revolutionised the field of Bayesian statistics over the past few years. Such methods provide invaluable tools to overcome problems with analytic intractability inherent in adopting the Bayesian approach to statistical modelling. However, any inference based upon MCMC output relies critically upon the assumption that the Markov chain being simulated has achieved a steady state or ’’converged‘‘. Many techniques have been developed for trying to determine whether or not a particular Markov chain has converged, and this paper aims to review these methods with an emphasis on the mathematics underpinning these techniques, in an attempt to summarise the current ’’state-of-play‘‘ for convergence assessment techniques and to motivate directions for future research in this area.

464 citations



Journal ArticleDOI
TL;DR: An exact algorithm to compute the location depth in three dimensions in O(n2logn) time and an approximate algorithm that computes the depth of a regression fit in O3+mpn+mnlogn time are constructed.
Abstract: The location depth (Tukey 1975) of a point θ relative to a p-dimensional data set Z of size n is defined as the smallest number of data points in a closed halfspace with boundary through θ. For bivariate data, it can be computed in O(nlogn) time (Rousseeuw and Ruts 1996). In this paper we construct an exact algorithm to compute the location depth in three dimensions in O(n2logn) time. We also give an approximate algorithm to compute the location depth in p dimensions in O(mp3+mpn) time, where m is the number of p-subsets used. Recently, Rousseeuw and Hubert (1996) defined the depth of a regression fit. The depth of a hyperplane with coefficients (θ1,…,θp) is the smallest number of residuals that need to change sign to make (θ1,…,θp) a nonfit. For bivariate data (p=2) this depth can be computed in O(nlogn) time as well. We construct an algorithm to compute the regression depth of a plane relative to a three-dimensional data set in O(n2logn) time, and another that deals with p=4 in O(n3logn) time. For data sets with large n and/or p we propose an approximate algorithm that computes the depth of a regression fit in O(mp3+mpn+mnlogn) time. For all of these algorithms, actual implementations are made available.

204 citations


Journal ArticleDOI
D. Nilsson1
TL;DR: This paper analyses Dawid (1992)’s ‘flow- propagation’ algorithm for finding the most probable configuration of the joint distribution in a probabilistic expert system, and shows how it can be combined with a clever partitioning scheme to formulate an efficient method forFinding the M most probable configurations.
Abstract: A probabilistic expert system provides a graphical representation of a joint probability distribution which enables local computations of probabilities. Dawid (1992) provided a ’flow- propagation‘ algorithm for finding the most probable configuration of the joint distribution in such a system. This paper analyses that algorithm in detail, and shows how it can be combined with a clever partitioning scheme to formulate an efficient method for finding the M most probable configurations. The algorithm is a divide and conquer technique, that iteratively identifies the M most probable configurations.

162 citations


Journal ArticleDOI
TL;DR: It is argued that the cusum path plot can bring out, more effectively than the sequential plot, those aspects of a Markov sampler which tell the user how quickly or slowly the sampler is moving around in its sample space, in the direction of the summary statistic.
Abstract: In this paper, we propose to monitor a Markov chain sampler using the cusum path plot of a chosen one-dimensional summary statistic. We argue that the cusum path plot can bring out, more effectively than the sequential plot, those aspects of a Markov sampler which tell the user how quickly or slowly the sampler is moving around in its sample space, in the direction of the summary statistic. The proposal is then illustrated in four examples which represent situations where the cusum path plot works well and not well. Moreover, a rigorous analysis is given for one of the examples. We conclude that the cusum path plot is an effective tool for convergence diagnostics of a Markov sampler and for comparing different Markov samplers.

142 citations


Journal ArticleDOI
TL;DR: A modification of the sampling technique, by defining a hybrid Markov chain in which, after each Gibbs sampling cycle, a Metropolis step is carried out along a direction of constant likelihood.
Abstract: Bayesian inference for the multinomial probit model, using the Gibbs sampler with data augmentation, has been recently considered by some authors. The present paper introduces a modification of the sampling technique, by defining a hybrid Markov chain in which, after each Gibbs sampling cycle, a Metropolis step is carried out along a direction of constant likelihood. Examples with simulated data sets motivate and illustrate the new technique. A proof of the ergodicity of the hybrid Markov chain is also given.

76 citations


Journal ArticleDOI
TL;DR: A global approach to both Bayesian and likelihood treatments of the estimation of the parameters of a hidden Markov model in the cases of normal and Poisson distributions is synthesized.
Abstract: This paper synthesizes a global approach to both Bayesian and likelihood treatments of the estimation of the parameters of a hidden Markov model in the cases of normal and Poisson distributions. The first step of this global method is to construct a non-informative prior based on a reparameterization of the model; this prior is to be considered as a penalizing and bounding factor from a likelihood point of view. The second step takes advantage of the special structure of the posterior distribution to build up a simple Gibbs algorithm. The maximum likelihood estimator is then obtained by an iterative procedure replicating the original sample until the corresponding Bayes posterior expectation stabilizes on a local maximum of the original likelihood function.

68 citations


Journal ArticleDOI
TL;DR: This paper proposes the use of auxiliary simulations to estimate the numerical values needed in this theorem and makes it possible to compute quantitative convergence bounds for models for which the requisite analytical computations would be prohibitively difficult or impossible.
Abstract: Markov chain Monte Carlo (MCMC) methods, including the Gibbs sampler and the Metropolis–Hastings algorithm, are very commonly used in Bayesian statistics for sampling from complicated, high-dimensional posterior distributions. A continuing source of uncertainty is how long such a sampler must be run in order to converge approximately to its target stationary distribution. A method has previously been developed to compute rigorous theoretical upper bounds on the number of iterations required to achieve a specified degree of convergence in total variation distance by verifying drift and minorization conditions. We propose the use of auxiliary simulations to estimate the numerical values needed in this theorem. Our simulation method makes it possible to compute quantitative convergence bounds for models for which the requisite analytical computations would be prohibitively difficult or impossible. On the other hand, although our method appears to perform well in our example problems, it cannot provide the guarantees offered by analytical proof.

55 citations


Journal ArticleDOI
TL;DR: This paper explores the problem of reducing a mixture of conjugate priors to a smaller mixture, from the perspective of the application of a distance measure between priors, and emerges that for mixtures of β-distributions a simple moment-matching reduction procedure is optimal and very good for the more general case of Dirichlet mixtures.
Abstract: This paper explores the problem of reducing a mixture of conjugate priors to a smaller mixture, from the perspective of the application of a distance measure between priors. The analysis focuses on mixtures of Dirichlet priors, but it has wider applicability. In respect to the proposed scheme, it emerges that for mixtures of β-distributions a simple moment-matching reduction procedure is optimal and very good for the more general case of Dirichlet mixtures.

46 citations


Journal ArticleDOI
TL;DR: This work proposes a guided walk Metropolis algorithm which suppresses some of the random walk behavior in the Markov chain, and shows that it performs better in terms of efficiency and convergence time.
Abstract: The random walk Metropolis algorithm is a simple Markov chain Monte Carlo scheme which is frequently used in Bayesian statistical problems. We propose a guided walk Metropolis algorithm which suppresses some of the random walk behavior in the Markov chain. This alternative algorithm is no harder to implement than the random walk Metropolis algorithm, but empirical studies show that it performs better in terms of efficiency and convergence time.

Journal ArticleDOI
TL;DR: The power and flexibility of Markov chain Monte Carlo methods to fit such classes of models to circular data and applications are given to multivariate and time series data of wind directions.
Abstract: One approach to defining models for circular data and processes has been to take a standard Euclidean model and to wrap it around the circle. This generates rich families of circular models but creates difficulties for inference. Using data augmentation ideas which have previously been applied to this problem in the framework of an EM algorithm, we demonstrate the power and flexibility of Markov chain Monte Carlo methods to fit such classes of models to circular data. The precision of the technique is confirmed through simulated examples, and then applications are given to multivariate and time series data of wind directions.

Journal ArticleDOI
TL;DR: A quantitative measure of smoothness is developed which can be associate with any given cusum path, and it is shown how this measure may be used to obtain an estimate of the burn-in length for any given sampler.
Abstract: Yu (1995) provides a novel convergence diagnostic for Markov chain Monte Carlo (MCMC) which provides a qualitative measure of mixing for Markov chains via a cusum path plot for univariate parameters of interest. The method is based upon the output of a single replication of an MCMC sampler and is therefore widely applicable and simple to use. One criticism of the method is that it is subjective in its interpretation, since it is based upon a graphical comparison of two cusum path plots. In this paper, we develop a quantitative measure of smoothness which we can associate with any given cusum path, and show how we can use this measure to obtain a quantitative measure of mixing. In particular, we derive the large sample distribution of this smoothness measure, so that objective inference is possible. In addition, we show how this quantitative measure may also be used to provide an estimate of the burn-in length for any given sampler. We discuss the utility of this quantitative approach, and highlight a problem which may occur if the chain is able to remain in any one state for some period of time. We provide a more general implementation of the method to overcome the problem in such cases.

Journal ArticleDOI
TL;DR: Some conditional models to deal with binary longitudinal responses are proposed, extending random effects models to include serial dependence of Markovian form, and hence allowing for quite general association structures between repeated observations recorded on the same individual.
Abstract: Some conditional models to deal with binary longitudinal responses are proposed, extending random effects models to include serial dependence of Markovian form, and hence allowing for quite general association structures between repeated observations recorded on the same individual. The presence of both these components implies a form of dependence between them, and so a complicated expression for the resulting likelihood. To handle this problem, we introduce, as a first instance, what Follmann and Wu (1995) called, in a different setting, an approximate conditional model, which represents an optimal choice for the general framework of categorical longitudinal responses. Then we define two more formally correct models for the binary case, with no assumption about the distribution of the random effect. All of the discussed models are estimated by means of an EM algorithm for nonparametric maximum likelihood. The algorithm, an adaptation of that used by Aitkin (1996) for the analysis of overdispersed generalized linear models, is initially derived as a form of Gaussian quadrature, and then extended to a completely unknown mixing distribution. A large scale simulation work is described to explore the behaviour of the proposed approaches in a number of different situations.

Journal ArticleDOI
TL;DR: This work considers an efficient Bayesian approach to estimating integration-based posterior summaries from a separate Bayesian application, and extends this theory by allowing g(·) to be chosen from two wider classes of functions.
Abstract: We consider an efficient Bayesian approach to estimating integration-based posterior summaries from a separate Bayesian application. In Bayesian quadrature we model an intractable posterior density function f(·) as a Gaussian process, using an approximating function g(·), and find a posterior distribution for the integral of f(·), conditional on a few evaluations of f (·) at selected design points. Bayesian quadrature using normal g (·) is called Bayes-Hermite quadrature. We extend this theory by allowing g(·) to be chosen from two wider classes of functions. One is a family of skew densities and the other is the family of finite mixtures of normal densities. For the family of skew densities we describe an iterative updating procedure to select the most suitable approximation and apply the method to two simulated posterior density functions.

Journal ArticleDOI
TL;DR: It is shown that a steepest descent flow on the manifold of orthogonal matrices can naturally be formulated, which has two important implications: that the weighted Orthogonal Procrustes problem can be solved as an initial value problem by any available numerical integrator and that the first order and the second order optimality conditions can also be derived.
Abstract: The weighted orthogonal Procrustes problem, an important class of data matching problems in multivariate data analysis, is reconsidered in this paper. It is shown that a steepest descent flow on the manifold of orthogonal matrices can naturally be formulated. This formulation has two important implications: that the weighted orthogonal Procrustes problem can be solved as an initial value problem by any available numerical integrator and that the first order and the second order optimality conditions can also be derived. The proposed approach is illustrated by numerical examples.

Journal ArticleDOI
TL;DR: In this paper, the authors describe the application of tools from statistical mechanics to analyse the dynamics of various classes of supervised learning rules in perceptrons, and present a coherent and self-contained picture of the basics of this field, to explain the ideas and tricks, to show how the predictions of the theory compare with (simulation) experiments, and to bring together scattered results.
Abstract: We describe the application of tools from statistical mechanics to analyse the dynamics of various classes of supervised learning rules in perceptrons. The character of this paper is mostly that of a cross between a biased non-encyclopaedic review and lecture notes: we try to present a coherent and self-contained picture of the basics of this field, to explain the ideas and tricks, to show how the predictions of the theory compare with (simulation) experiments, and to bring together scattered results. Technical details are given explicitly in an appendix. In order to avoid distraction we concentrate the references in a final section. In addition this paper contains some new results: (i) explicit solutions of the macroscopic equations that describe the error evolution for on-line and batch learning rules; (ii) an analysis of the dynamics of arbitrary macroscopic observables (for complete and incomplete training sets), leading to a general Fokker–Planck equation; and (iii) the macroscopic laws describing batch learning with complete training sets. We close the paper with a preliminary expose´ of ongoing research on the dynamics of learning for the case where the training set is incomplete (i.e. where the number of examples scales linearly with the network size).

Journal ArticleDOI
TL;DR: This paper examines a new approach to this problem based on the use of Akaike Information Criterion based pruning of data driven mixture models which are obtained from resampled data sets.
Abstract: Given i.i.d. observations x1,x2,x3,…,xn drawn from a mixture of normal terms, one is often interested in determining the number of terms in the mixture and their defining parameters. Although the problem of determining the number of terms is intractable under the most general assumptions, there is hope of elucidating the mixture structure given appropriate caveats on the underlying mixture. This paper examines a new approach to this problem based on the use of Akaike Information Criterion (AIC) based pruning of data driven mixture models which are obtained from resampled data sets. Results of the application of this procedure to artificially generated data sets and a real world data set are provided.

Journal ArticleDOI
TL;DR: This work considers two methods of making use of the coaching variables in order to improve the prediction of Y from x1,x2,..., xp.
Abstract: In a regression or classification setting where we wish to predict Y from x1,x2,…, xp, we suppose that an additional set of ’coaching‘ variables z1,z2,…, zm are available in our training sample. These might be variables that are difficult to measure, and they will not be available when we predict Y from x1,x2,…, xp in the future. We consider two methods of making use of the coaching variables in order to improve the prediction of Y from x1,x2,…, xp. The relative merits of these approaches are discussed and compared in a number of examples.

Journal ArticleDOI
TL;DR: This article addresses the problem of identification of partly destroyed human melanoma cancer cells in confocal microscopy imaging by doing simultaneous inference for both the natural and destructive deformation field, and is able to obtain reliable estimates of the outline in addition to where on the boundary the cell is destroyed.
Abstract: This article addresses the problem of identification of partly destroyed human melanoma cancer cells in confocal microscopy imaging. Complete cancer cells are nearly circular and most of them have a nearly homogeneous boundary and interior region. A deformable template (Grenander, 1993) is well suited for these complete cells and models a cell as a natural deformed template or prototype. We will in this article focus on the remaining cells which have lost parts of the boundary region most probably due to a ’capping‘ phenomenon. We can interpret these cells as being partly destroyed, where in our statistical model the lost part of the boundary region is generated by a destructive deformation field acting and living on the cell or template. By doing simultaneous inference for both the natural and destructive deformation field, we are able to obtain reliable estimates of the outline in addition to where on the boundary the cell is destroyed. We apply our model to identifying partly destroyed human melanoma cancer cells with good results.

Journal ArticleDOI
TL;DR: Algebraic calculations that depend upon a full partition can be automated through the use of an operator P for the derivation of such a partition by simply iterating the operator.
Abstract: Algebraic calculations that depend upon a full partition can be automated through the use of an operator P for the derivation of such a partition. Calculations that require the repeated use of P are automated by simply iterating the operator. The resulting output is general and contains sufficient structure to identify the result of a calculation for a variety of settings.

Journal ArticleDOI
TL;DR: Bayesian models in which the interpolant has a smoothness that varies spatially are described, emphasizing the importance, in practical implementation, of the concept of ‘conditional convexity’ when designing models with many hyperparameters.
Abstract: A traditional interpolation model is characterized by the choice of regularizer applied to the interpolant, and the choice of noise model. Typically, the regularizer has a single regularization constant α, and the noise model has a single parameter β. The ratio α/β alone is responsible for determining globally all these attributes of the interpolant: its ’complexity‘, ’flexibility‘, ’smoothness‘, ’characteristic scale length‘, and ’characteristic amplitude‘. We suggest that interpolation models should be able to capture more than just one flavour of simplicity and complexity. We describe Bayesian models in which the interpolant has a smoothness that varies spatially. We emphasize the importance, in practical implementation, of the concept of ’conditional convexity‘ when designing models with many hyperparameters. We apply the new models to the interpolation of neuronal spike data and demonstrate a substantial improvement in generalization error.

Journal ArticleDOI
TL;DR: The matching pursuit shows in an iterative fashion how localized dictionary elements (scale and position) account for residual variation, and in particular emphasizes differences in construction for varying parts of the series.
Abstract: We describe how to formulate a matching pursuit algorithm which successively approximates a periodic non-stationary time series with orthogonal projections onto elements of a suitable dictionary. We discuss how to construct such dictionaries derived from the maximal overlap (undecimated) discrete wavelet transform (MODWT). Unlike the standard discrete wavelet transform (DWT), the MODWT is equivariant under circular shifts and may be computed for an arbitrary length time series, not necessarily a multiple of a power of 2. We point out that when using the MODWT and continuing past the level where the filters are wrapped, the norms of the dictionary elements may, depending on N, deviate from the required value of unity and require renormalization. We analyse a time series of subtidal sea levels from Crescent City, California. The matching pursuit shows in an iterative fashion how localized dictionary elements (scale and position) account for residual variation, and in particular emphasizes differences in construction for varying parts of the series.

Journal ArticleDOI
TL;DR: This paper proposes new recursive algorithms for exact goodness-of-fit tests of quasi-independence, quasi-symmetry, linear-by-linear association and some related models, and proposes that all computations be carried out using symbolic computation and rational arithmetic in order to calculate the exact p-values accurately.
Abstract: A two-way contingency table in which both variables have the same categories is termed a symmetric table. In many applications, because of the social processes involved, most of the observations lie on the main diagonal and the off-diagonal counts are small. For these tables, the model of independence is implausible and interest is then focussed on the off-diagonal cells and the models of quasi-independence and quasi-symmetry. For ordinal variables, a linear-by-linear association model can be used to model the interaction structure. For sparse tables, large-sample goodness-of-fit tests are often unreliable and one should use an exact test. In this paper, we review exact tests and the computing problems involved. We propose new recursive algorithms for exact goodness-of-fit tests of quasi-independence, quasi-symmetry, linear-by-linear association and some related models. We propose that all computations be carried out using symbolic computation and rational arithmetic in order to calculate the exact p-values accurately and describe how we implemented our proposals. Two examples are presented.

Journal ArticleDOI
TL;DR: It is shown that for the Bayes-optimal algorithm, expected off-training-set error can increase with training set size when the target function is fixed, but if and only if the expected error averaged over all targets decreases withTraining set size.
Abstract: In this paper we analyse the average behaviour of the Bayes-optimal and Gibbs learning algorithms. We do this both for off-training-set error and conventional IID (independent identically distributed) error (for which test sets overlap with training sets). For the IID case we provide a major extension to one of the better known results. We also show that expected IID test set error is a non-increasing function of training set size for either algorithm. On the other hand, as we show, the expected off-training-set error for both learning algorithms can increase with training set size, for non-uniform sampling distributions. We characterize the relationship the sampling distribution must have with the prior for such an increase. We show in particular that for uniform sampling distributions and either algorithm, the expected off-training-set error is a non-increasing function of training set size. For uniform sampling distributions, we also characterize the priors for which the expected error of the Bayes-optimal algorithm stays constant. In addition we show that for the Bayes-optimal algorithm, expected off-training-set error can increase with training set size when the target function is fixed, but if and only if the expected error averaged over all targets decreases with training set size. Our results hold for arbitrary noise and arbitrary loss functions.


Journal ArticleDOI
TL;DR: This paper describes the experience of statisticians with a massively parallel computer in a problem of image correlation spectroscopy, and it is shown that some of the algorithms of interest can be made parallel.
Abstract: Massively parallel computing is a computing environment with thousands of subprocessors. It requires some special programming methods, but is well suited to certain imaging problems. One such statistical example is discussed in this paper. In addition there are other natural statistical problems for which this technology is well suited. This paper describes our experience, as statisticians, with a massively parallel computer in a problem of image correlation spectroscopy. Even with this computing environment some direct computations would still take in the order of a year to finish. It is shown that some of the algorithms of interest can be made parallel.

Journal ArticleDOI
TL;DR: An alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed.
Abstract: The most common techniques for graphically presenting a multivariate dataset involve projection onto a one or two-dimensional subspace. Interpretation of such plots is not always straightforward because projections are smoothing operations in that structure can be obscured by projection but never enhanced. In this paper an alternative procedure for finding interesting features is proposed that is based on locating the modes of an induced hyperspherical density function, and a simple algorithm for this purpose is developed. Emphasis is placed on identifying the non-linear effects, such as clustering, so to this end the data are firstly sphered to remove all of the location, scale and correlational structure. A set of simulated bivariate data and artistic qualities of painters data are used as examples.

Journal ArticleDOI
TL;DR: A new approach to detecting shapes in images of the first type that uses a specially designed filter function that iteratively identifies the outline pixels of the head that is based on the cascade algorithm introduced by Jubb and Jennison (1991) and proposed by Storvik (1994).
Abstract: We discuss the detection of a connected shape in a noisy image Two types of image are considered: in the first a degraded outline of the shape is visible, while in the second the data are a corrupted version of the shape itself In the first type the shape is defined by a thin outline of pixels with records that are different from those at pixels inside and outside the shape, while in the second type the shape is defined by its edge and pixels inside and outside the shape have different records Our motivation is the identification of cross-sectional head shapes in ultrasound images of human fetuses We describe and discuss a new approach to detecting shapes in images of the first type that uses a specially designed filter function that iteratively identifies the outline pixels of the head We then suggest a way based on the cascade algorithm introduced by Jubb and Jennison (1991) of improving and considerably increasing the speed of a method proposed by Storvik (1994) for detecting edges in images of the second type

Journal ArticleDOI
TL;DR: The significance of various types of ‘dimension’ for obtaining uniform convergence results in probability theory are highlighted and how these results lead to certain notions of generalization for classes of binary-valued and real-valued functions are demonstrated.
Abstract: In this largely expository article, we highlight the significance of various types of ’dimension‘ for obtaining uniform convergence results in probability theory and we demonstrate how these results lead to certain notions of generalization for classes of binary-valued and real-valued functions. We also present new results on the generalization ability of certain types of artificial neural networks with real output.