scispace - formally typeset
Search or ask a question

Showing papers in "Statistics and Computing in 2004"


Journal ArticleDOI
TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Abstract: In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

10,696 citations


Journal ArticleDOI
TL;DR: Test results are provided, along with recommendations for the most efficient algorithms for single and double precision computations, and a generalization of Plackett's formula is derived for bivariate and trivariate t probabilities.
Abstract: Algorithms for the computation of bivariate and trivariate normal and t probabilities for rectangles are reviewed. The algorithms use numerical integration to approximate transformed probability distribution integrals. A generalization of Plackett's formula is derived for bivariate and trivariate t probabilities. New methods are described for the numerical computation of bivariate and trivariate t probabilities. Test results are provided, along with recommendations for the most efficient algorithms for single and double precision computations.

291 citations


Journal ArticleDOI
TL;DR: The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu, and in many situations it outperforms a Gibbs Sampler.
Abstract: We consider the analysis of data under mixture models where the number of components in the mixture is unknown. We concentrate on mixture Dirichlet process models, and in particular we consider such models under conjugate priors. This conjugacy enables us to integrate out many of the parameters in the model, and to discretize the posterior distribution. Particle filters are particularly well suited to such discrete problems, and we propose the use of the particle filter of Fearnhead and Clifford for this problem. The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu. In many situations it outperforms a Gibbs Sampler. We also show how models without the required amount of conjugacy can be efficiently analyzed by the same particle filter algorithm.

157 citations


Journal ArticleDOI
TL;DR: A novel perspective on the max-product algorithm is provided, based on the idea of reparameterizing the distribution in terms of so-called pseudo-max-marginals on nodes and edges of the graph, to provide conceptual insight into the algorithm in application to graphs with cycles.
Abstract: Finding the maximum a posteriori (MAP) assignment of a discrete-state distribution specified by a graphical model requires solving an integer program. The max-product algorithm, also known as the max-plus or min-sum algorithm, is an iterative method for (approximately) solving such a problem on graphs with cycles. We provide a novel perspective on the algorithm, which is based on the idea of reparameterizing the distribution in terms of so-called pseudo-max-marginals on nodes and edges of the graph. This viewpoint provides conceptual insight into the max-product algorithm in application to graphs with cycles. First, we prove the existence of max-product fixed points for positive distributions on arbitrary graphs. Next, we show that the approximate max-marginals computed by max-product are guaranteed to be consistent, in a suitable sense to be defined, over every tree of the graph. We then turn to characterizing the nature of the approximation to the MAP assignment computed by max-product. We generalize previous work by showing that for any graph, the max-product assignment satisfies a particular optimality condition with respect to any subgraph containing at most one cycle per connected component. We use this optimality condition to derive upper bounds on the difference between the log probability of the true MAP assignment, and the log probability of a max-product assignment. Finally, we consider extensions of the max-product algorithm that operate over higher-order cliques, and show how our reparameterization analysis extends in a natural manner.

155 citations


Journal ArticleDOI
TL;DR: This paper proposes an approach to the construction of the reversible jump Markov chain Monte Carlo algorithm for this model, a simplified multivariate Gaussian mixture model, in which the covariance matrices of all components share a common eigenvector matrix.
Abstract: This paper is a contribution to the methodology of fully Bayesian inference in a multivariate Gaussian mixture model using the reversible jump Markov chain Monte Carlo algorithm. To follow the constraints of preserving the first two moments before and after the split or combine moves, we concentrate on a simplified multivariate Gaussian mixture model, in which the covariance matrices of all components share a common eigenvector matrix. We then propose an approach to the construction of the reversible jump Markov chain Monte Carlo algorithm for this model. Experimental results on several data sets demonstrate the efficacy of our algorithm.

89 citations


Journal ArticleDOI
TL;DR: Although the solutions provided by these stochastic algorithms are more often degenerate, it is concluded that SEM and MCMC may display faster convergence and improve the ability to locate the global maximum of the likelihood function.
Abstract: We compare EM, SEM, and MCMC algorithms to estimate the parameters of the Gaussian mixture model. We focus on problems in estimation arising from the likelihood function having a sharp ridge or saddle points. We use both synthetic and empirical data with those features. The comparison includes Bayesian approaches with different prior specifications and various procedures to deal with label switching. Although the solutions provided by these stochastic algorithms are more often degenerate, we conclude that SEM and MCMC may display faster convergence and improve the ability to locate the global maximum of the likelihood function.

86 citations


Journal ArticleDOI
TL;DR: This article proposes a simple EM-based ML estimation procedure to estimate parameters of the distribution when the subclass is known regardless of the dimensionality, which relies on the ability to numerically evaluate modified Bessel functions of the third kind and their logarithms, which is made possible by currently available software.
Abstract: Generalized Hyperbolic distribution (Barndorff-Nielsen 1977) is a variance-mean mixture of a normal distribution with the Generalized Inverse Gaussian distribution. Recently subclasses of these distributions (e.g., the hyperbolic distribution and the Normal Inverse Gaussian distribution) have been applied to construct stochastic processes in turbulence and particularly in finance, where multidimensional problems are of special interest. Parameter estimation for these distributions based on an i.i.d. sample is a difficult task even for a specified one-dimensional subclass (subclass being uniquely defined by λ) and relies on numerical methods. For the hyperbolic subclass (λ e 1), computer program ‘hyp’ (Blaesild and Sorensen 1992) estimates parameters via ML when the dimensionality is less than or equal to three. To the best of the author's knowledge, no successful attempts have been made to fit any given subclass when the dimensionality is greater than three. This article proposes a simple EM-based (Dempster, Laird and Rubin 1977) ML estimation procedure to estimate parameters of the distribution when the subclass is known regardless of the dimensionality. Our method relies on the ability to numerically evaluate modified Bessel functions of the third kind and their logarithms, which is made possible by currently available software. The method is applied to fit the five dimensional Normal Inverse Gaussian distribution to a series of returns on foreign exchange rates.

80 citations


Journal ArticleDOI
TL;DR: A Bayesian approach for inference about parameters of t-mixture models using the multivariate t distribution and two efficient EM-type algorithms for computing the joint posterior mode with the observed data and an incomplete future vector as the sample are presented.
Abstract: A finite mixture model using the multivariate t distribution has been shown as a robust extension of normal mixtures. In this paper, we present a Bayesian approach for inference about parameters of t-mixture models. The specifications of prior distributions are weakly informative to avoid causing nonintegrable posterior distributions. We present two efficient EM-type algorithms for computing the joint posterior mode with the observed data and an incomplete future vector as the sample. Markov chain Monte Carlo sampling schemes are also developed to obtain the target posterior distribution of parameters. The advantages of Bayesian approach over the maximum likelihood method are demonstrated via a set of real data.

43 citations


Journal ArticleDOI
TL;DR: Various parallel MCMC algorithms for Bayesian inference for latent spatial Gaussian models are proposed and their performance is discussed with respect to a simulation study, which demonstrates the increase in speed with which the algorithms explore the posterior distribution as a function of the number of processors.
Abstract: Markov chain Monte Carlo (MCMC) implementations of Bayesian inference for latent spatial Gaussian models are very computationally intensive, and restrictions on storage and computation time are limiting their application to large problems. Here we propose various parallel MCMC algorithms for such models. The algorithms' performance is discussed with respect to a simulation study, which demonstrates the increase in speed with which the algorithms explore the posterior distribution as a function of the number of processors. We also discuss how feasible problem size is increased by use of these algorithms.

34 citations


Journal ArticleDOI
TL;DR: The structure and weaknesses of recently proposed random number generators based on special types of linear recurrences with small coefficients, which allow fast implementations are studied and pointed out.
Abstract: We study the structure and point out weaknesses of recently proposed random number generators based on special types of linear recurrences with small coefficients, which allow fast implementations. Our theoretical analysis is complemented by the results of simple empirical statistical tests that the generators fail decisively. Directions for improvement and alternative generators are also pointed out.

34 citations


Journal ArticleDOI
TL;DR: A strategy is proposed to initialize the EM algorithm in the multivariate Gaussian mixture context by randomly drawing, with a low computational cost in many situations, initial mixture parameters in an appropriate space including all possible EM trajectories.
Abstract: A strategy is proposed to initialize the EM algorithm in the multivariate Gaussian mixture context. It consists in randomly drawing, with a low computational cost in many situations, initial mixture parameters in an appropriate space including all possible EM trajectories. This space is simply defined by two relations between the two first empirical moments and the mixture parameters satisfied by any EM iteration. An experimental study on simulated and real data sets clearly shows that this strategy outperforms classical methods, since it has the nice property to widely explore local maxima of the likelihood function.

Journal ArticleDOI
TL;DR: This work considers inference for queues based on inter-departure time data, and demonstrates how a likelihood recursion can be used to calculate this likelihood efficiently for the specific cases of M/G/1 and Er/G-1 queues.
Abstract: We consider inference for queues based on inter-departure time data. Calculating the likelihood for such models is difficult, as the likelihood involves summing up over the (exponentially-large) space of realisations of the arrival process. We demonstrate how a likelihood recursion can be used to calculate this likelihood efficiently for the specific cases of M/G/1 and Er/G/1 queues. We compare the sampling properties of the mles to the sampling properties of estimators, based on indirect inference, which have previously been suggested for this problem.

Journal ArticleDOI
TL;DR: The complex Bingham distribution is shown that the problem of simulating from this distribution reduces to simulation from a truncated multivariate exponential distribution.
Abstract: The complex Bingham distribution is relevant for the shape analysis of landmark data in two dimensions. In this paper it is shown that the problem of simulating from this distribution reduces to simulation from a truncated multivariate exponential distribution. Several simulation methods are described and their efficiencies are compared.

Journal ArticleDOI
TL;DR: A broader evaluation of two strategies that can potentially improve Markov Chain Monte Carlo algorithms are to use derivative evaluations of the target density, and to suppress random walk behaviour in the chain.
Abstract: Two strategies that can potentially improve Markov Chain Monte Carlo algorithms are to use derivative evaluations of the target density, and to suppress random walk behaviour in the chain. The use of one or both of these strategies has been investigated in a few specific applications, but neither is used routinely. We undertake a broader evaluation of these techniques, with a view to assessing their utility for routine use. In addition to comparing different algorithms, we also compare two different ways in which the algorithms can be applied to a multivariate target distribution. Specifically, the univariate version of an algorithm can be applied repeatedly to one-dimensional conditional distributions, or the multivariate version can be applied directly to the target distribution.

Journal ArticleDOI
TL;DR: The paper considers the multivariate gamma distribution for which the method of moments has been considered as the only method of estimation due to the complexity of the likelihood function and proposes new methods using artificial data for a trivariate gamma distribution and an application to technical inefficiency estimation.
Abstract: The paper considers the multivariate gamma distribution for which the method of moments has been considered as the only method of estimation due to the complexity of the likelihood function. With a non-conjugate prior, practical Bayesian analysis can be conducted using Gibbs sampling with data augmentation. The new methods are illustrated using artificial data for a trivariate gamma distribution as well as an application to technical inefficiency estimation.

Journal ArticleDOI
TL;DR: A novel parameterization of shape change that allows the parsimonious description of facial motion is introduced and allows for a distinction between static facial shape and dynamic facial motion.
Abstract: The movement of landmarks on the human face can be recorded in 3D using motion capture equipment. We describe methods for the analysis of data collected on groups of subjects with a view to describing and assessing the differences between the facial motions of those groups. We focus on the smile motion in particular. The methods presented can be used more generally for continuous shape change data. We introduce a novel parameterization of shape change that allows the parsimonious description of facial motion. We allow for a distinction between static facial shape and dynamic facial motion. We describe statistical methods for modeling differences in facial motion including a comparison of mean motions, principal components for describing the variation in motion and linear models for describing the effects of predictors.

Journal ArticleDOI
TL;DR: This article presents an optimal discretization of nonparametric covariogram estimators for isotropic stationary stochastic processes and presents an answer to an issue raised by Hall, Fisher and Hoffmann (1994).
Abstract: In this article, we describe the discretization of nonparametric covariogram estimators for isotropic stationary stochastic processes. The use of nonparametric estimators is important to avoid the difficulties in selecting a parametric model. The key property the isotropic covariogram must satisfy is to be positive definite and thus have the form characterized by Yaglom's representation of Bochner's theorem. We present an optimal discretization of the latter in the sense that the resulting nonparametric covariogram estimators are guaranteed to be smooth and positive definite in the continuum. This provides an answer to an issue raised by Hall, Fisher and Hoffmann (1994). Furthermore, from a practical viewpoint, our result is important because a nonlinear constrained algorithm can sometimes be avoided and the solution can be found by least squares. Some numerical results are presented for illustration.

Journal ArticleDOI
TL;DR: This paper proposes two automatic methods for making the choice of the order of the low-order polynomial, as well as the wavelet thresholding value, and evaluation of these two methods is evaluated via numerical experiments.
Abstract: In Oh, Naveau and Lee (2001) a simple method is proposed for reducing the bias at the boundaries for wavelet thresholding regression. The idea is to model the regression function as a sum of wavelet basis functions and a low-order polynomial. The latter is expected to account for the boundary problem. Practical implementation of this method requires the choice of the order of the low-order polynomial, as well as the wavelet thresholding value. This paper proposes two automatic methods for making such choices. Finite sample performances of these two methods are evaluated via numerical experiments.

Journal ArticleDOI
TL;DR: A general framework for the analysis of count data (with covariates) is proposed using formulations for the transition rates of a state-dependent birth process, with Computation of the resulting probabilities leading to model estimation using a penalized likelihood function.
Abstract: A general framework for the analysis of count data (with covariates) is proposed using formulations for the transition rates of a state-dependent birth process. The form for the transition rates incorporates covariates proportionally, with the residual distribution determined from a smooth non-parametric state-dependent form. Computation of the resulting probabilities is discussed, leading to model estimation using a penalized likelihood function. Two data sets are used as illustrative examples, one representing underdispersed Poisson-like data and the other overdispersed binomial-like data.

Journal ArticleDOI
TL;DR: The results show that the best point is not necessarily the posterior mode, but rather a point compromising between high density and low Hessian, and a variance reduction approach is introduced to ease the tension caused by data sparseness.
Abstract: Computing marginal probabilities is an important and fundamental issue in Bayesian inference. We present a simple method which arises from a likelihood identity for computation. The likelihood identity, called Candidate's formula, sets the marginal probability as a ratio of the prior likelihood to the posterior density. Based on Markov chain Monte Carlo output simulated from the posterior distribution, a nonparametric kernel estimate is used to estimate the posterior density contained in that ratio. This derived nonparametric Candidate's estimate requires only one evaluation of the posterior density estimate at a point. The optimal point for such evaluation can be chosen to minimize the expected mean square relative error. The results show that the best point is not necessarily the posterior mode, but rather a point compromising between high density and low Hessian. For high dimensional problems, we introduce a variance reduction approach to ease the tension caused by data sparseness. A simulation study is presented.

Journal ArticleDOI
TL;DR: This study proposes a fuzzy clustering algorithm (FCA) based on both the maximum penalized likelihood (MPL) for the latent class model and the modified penalty fuzzy c-means (PFCM) for normal mixtures.
Abstract: The expectation maximization (EM) algorithm is a widely used parameter approach for estimating the parameters of multivariate multinomial mixtures in a latent class model. However, this approach has unsatisfactory computing efficiency. This study proposes a fuzzy clustering algorithm (FCA) based on both the maximum penalized likelihood (MPL) for the latent class model and the modified penalty fuzzy c-means (PFCM) for normal mixtures. Numerical examples confirm that the FCA-MPL algorithm is more efficient (that is, requires fewer iterations) and more computationally effective (measured by the approximate relative ratio of accurate classification) than the EM algorithm.

Journal ArticleDOI
TL;DR: This paper reconsiders the well-known oblique Procrustes problem where the usual least-squares objective function is replaced by a more robust discrepancy measure, based on the ℓ1 norm or smooth approximations of it, and proposes two approaches to the solution of this problem.
Abstract: In this paper, we reconsider the well-known oblique Procrustes problem where the usual least-squares objective function is replaced by a more robust discrepancy measure, based on the e1 norm or smooth approximations of it. We propose two approaches to the solution of this problem. One approach is based on convex analysis and uses the structure of the problem to permit a solution to the e1 norm problem. An alternative approach is to smooth the problem by working with smooth approximations to the e1 norm, and this leads to a solution process based on the solution of ordinary differential equations on manifolds. The general weighted Procrustes problem (both orthogonal and oblique) can also be solved by the latter approach. Numerical examples to illustrate the algorithms which have been developed are reported and analyzed.

Journal ArticleDOI
TL;DR: A genetic algorithm-based approach is presented, which merely has to be added to the influence diagram evaluation algorithm it uses, and whose codification is straightforward, showing favourable results over existing heuristics.
Abstract: Influence diagrams are powerful tools for representing and solving complex inference and decision-making problems under uncertainty. They are directed acyclic graphs with nodes and arcs that have a precise meaning. The algorithm for evaluating an influence diagram deletes nodes from the graph in a particular order given by the position of each node and its arcs with respect to the value node. In many cases, however, there is more than one possible node deletion sequence. They all lead to the optimal solution of the problem, but may involve different computational efforts, which is a primary issue when facing real-size models. Finding the optimal deletion sequence is a NP-hard problem. The proposals given in the literature have proven to require complex transformations of the influence diagram. In this paper, we present a genetic algorithm-based approach, which merely has to be added to the influence diagram evaluation algorithm we use, and whose codification is straightforward. The experiments, varying parameters like crossover and mutation operators, population sizes and mutation rates, are analysed statistically, showing favourable results over existing heuristics.

Journal ArticleDOI
TL;DR: Bayesian methods for obtaining inferences in pairwise interacting point processes are proposed and the use of importance sampling techniques within Markov chain Monte Carlo techniques within MCMC are proposed.
Abstract: Pairwise interacting point processes are commonly used to model spatial point patterns. To perform inference, the established frequentist methods can produce good point estimates when the interaction in the data is moderate, but some methods may produce severely biased estimates when the interaction in strong. Furthermore, because the sampling distributions of the estimates are unclear, interval estimates are typically obtained by parametric bootstrap methods. In the current setting however, the behavior of such estimates is not well understood. In this article we propose Bayesian methods for obtaining inferences in pairwise interacting point processes. The requisite application of Markov chain Monte Carlo (MCMC) techniques is complicated by an intractable function of the parameters in the likelihood. The acceptance probability in a Metropolis-Hastings algorithm involves the ratio of two likelihoods evaluated at differing parameter values. The intractable functions do not cancel, and hence an intractable ratio r must be estimated within each iteration of a Metropolis-Hastings sampler. We propose the use of importance sampling techniques within MCMC to address this problem. While r may be estimated by other methods, these, in general, are not readily applied in a Bayesian setting. We demonstrate the validity of our importance sampling approach with a small simulation study. Finally, we analyze the Swedish pine sapling dataset (Strand 1972) and contrast the results with those in the literature.

Journal ArticleDOI
TL;DR: This work considers the problem of testing for additivity and joint effects in multivariate nonparametric regression when the data are modelled as observations of an unknown response function observed on a d-dimensional lattice and contaminated with additive Gaussian noise.
Abstract: We consider the problem of testing for additivity and joint effects in multivariate nonparametric regression when the data are modelled as observations of an unknown response function observed on a d-dimensional (d ≥ 2) lattice and contaminated with additive Gaussian noise. We propose tests for additivity and joint effects, appropriate for both homogeneous and inhomogeneous response functions, using the particular structure of the data expanded in tensor product Fourier or wavelet bases studied recently by Amato and Antoniadis (2001) and Amato, Antoniadis and De Feis (2002). The corresponding tests are constructed by applying the adaptive Neyman truncation and wavelet thresholding procedures of Fan (1996), for testing a high-dimensional Gaussian mean, to the resulting empirical Fourier and wavelet coefficients. As a consequence, asymptotic normality of the proposed test statistics under the null hypothesis and lower bounds of the corresponding powers under a specific alternative are derived. We use several simulated examples to illustrate the performance of the proposed tests, and we make comparisons with other tests available in the literature.

Journal ArticleDOI
TL;DR: A Monte Carlo algorithm which computes accurate approximations of smooth functions on multidimensional Tchebychef polynomials is improved by using quasi-random sequences, leading to a Quasi-Monte Carlo method with an increased rate of convergence for numerical integration.
Abstract: We improve a Monte Carlo algorithm which computes accurate approximations of smooth functions on multidimensional Tchebychef polynomials by using quasi-random sequences. We first show that the convergence of the previous algorithm is twice faster using these sequences. Then, we slightly modify this algorithm to make it work from a single set of random or quasi-random points. This especially leads to a Quasi-Monte Carlo method with an increased rate of convergence for numerical integration.

Journal ArticleDOI
TL;DR: A correlated probit model approximation for conditional probabilities (Mendell and Elston 1974) is used to estimate the variance for binary matched pairs data by maximum likelihood, and shows a substantial advantage over other approximations.
Abstract: A correlated probit model approximation for conditional probabilities (Mendell and Elston 1974) is used to estimate the variance for binary matched pairs data by maximum likelihood. Using asymptotic data, the bias of the estimates is shown to be small for a wide range of intra-class correlations and incidences. This approximation is also compared with other recently published, or implemented, improved approximations. For the small sample examples presented, it shows a substantial advantage over other approximations. The method is extended to allow covariates for each observation, and fitting by iteratively reweighted least squares.

Journal ArticleDOI
TL;DR: Two new methods for computing with hypergeometric distributions on lattice points are presented that use Fourier analysis and Gröbner bases in the Weyl algebra for log-linear models that are graphical or non-graphical.
Abstract: Two new methods for computing with hypergeometric distributions on lattice points are presented. One uses Fourier analysis, and the other uses Grobner bases in the Weyl algebra. Both are very general and apply to log-linear models that are graphical or non-graphical.

Journal ArticleDOI
TL;DR: An algorithm which uses back buttons to achieve essentially any limiting distribution on the state space to achieve the desired total fraction of time at each web page is developed.
Abstract: As a simple model for browsing the World Wide Web, we consider Markov chains with the option of moving “back” to the previous state. We develop an algorithm which uses back buttons to achieve essentially any limiting distribution on the state space. This corresponds to spending the desired total fraction of time at each web page. On finite state spaces, our algorithm always succeeds. On infinite state spaces the situation is more complicated, and is related to both the tail behaviour of the distributions, and the properties of convolution equations.

Journal ArticleDOI
TL;DR: It is shown that Liu's simplex median is tractable, and has distinctive desirable properties that recommend it for use in data analysis.
Abstract: While much attention has recently focussed on the use of multivariate medians for estimating the centre of a data set, a particular median based upon random simplexes proposed by Liu has been overshadowed by progress on other multivariate medians. The purpose of this paper is to redress the balance, and to show that Liu's simplex median is tractable, and has distinctive desirable properties that recommend it for use in data analysis.