scispace - formally typeset
Search or ask a question

Showing papers in "Statistics and Computing in 1996"


Journal ArticleDOI
Jun Liu1
TL;DR: In this paper, a special Metropolis-Hastings algorithm, Metropolized independent sampling, proposed first in Hastings (1970), is studied in full detail and shown to be superior to rejection sampling in two respects: asymptotic efficiency and ease of computation.
Abstract: Although Markov chain Monte Carlo methods have been widely used in many disciplines, exact eigen analysis for such generated chains has been rare. In this paper, a special Metropolis-Hastings algorithm, Metropolized independent sampling, proposed first in Hastings (1970), is studied in full detail. The eigenvalues and eigenvectors of the corresponding Markov chain, as well as a sharp bound for the total variation distance between the nth updated distribution and the target distribution, are provided. Furthermore, the relationship between this scheme, rejection sampling, and importance sampling are studied with emphasis on their relative efficiencies. It is shown that Metropolized independent sampling is superior to rejection sampling in two respects: asymptotic efficiency and ease of computation.

596 citations


Journal ArticleDOI
TL;DR: Five different parametrizations for variance-covariance matrices that ensure positive definiteness, thus leaving the estimation problem unconstrained in maximum likelihood and restricted maximum likelihood estimation in linear and non-linear mixed-effects models are described.
Abstract: The estimation of variance-covariance matrices through optimization of an objective function, such as a log-likelihood function, is usually a difficult numerical problem. Since the estimates should be positive semi-definite matrices, we must use constrained optimization, or employ a parametrization that enforces this condition. We describe here five different parametrizations for variance-covariance matrices that ensure positive definiteness, thus leaving the estimation problem unconstrained. We compare the parametrizations based on their computational efficiency and statistical interpretability. The results described here are particularly useful in maximum likelihood and restricted maximum likelihood estimation in linear and non-linear mixed-effects models, but are also applicable to other areas of statistics.

457 citations


Journal ArticleDOI
TL;DR: A new Markov chain sampling method appropriate for distributions with isolated modes that uses a series of distributions that interpolate between the distribution of interest and a distribution for which sampling is easier, with the advantage that it does not require approximate values for the normalizing constants of these distributions.
Abstract: I present a new Markov chain sampling method appropriate for distributions with isolated modes. Like the recently developed method of ‘simulated tempering’, the ‘tempered transition’ method uses a series of distributions that interpolate between the distribution of interest and a distribution for which sampling is easier. The new method has the advantage that it does not require approximate values for the normalizing constants of these distributions, which are needed for simulated tempering, and can be tedious to estimate. Simulated tempering performs a random walk along the series of distributions used. In contrast, the tempered transitions of the new method move systematically from the desired distribution, to the easily-sampled distribution, and back to the desired distribution. This systematic movement avoids the inefficiency of a random walk, an advantage that is unfortunately cancelled by an increase in the number of interpolating distributions required. Because of this, the sampling efficiency of the tempered transition method in simple problems is similar to that of simulated tempering. On more complex distributions, however, simulated tempering and tempered transitions may perform differently. Which is better depends on the ways in which the interpolating distributions are ‘deceptive’.

406 citations


Journal ArticleDOI
TL;DR: A multivariate Hastings-within-Gibbs update step for generating latent data and bin boundary parameters jointly, instead of individually from their respective full conditionals, substantially improves Gibbs sampler convergence for large datasets.
Abstract: The ordinal probit, univariate or multivariate, is a generalized linear model (GLM) structure that arises frequently in such disparate areas of statistical applications as medicine and econometrics. Despite the straightforwardness of its implementation using the Gibbs sampler, the ordinal probit may present challenges in obtaining satisfactory convergence. We present a multivariate Hastings-within-Gibbs update step for generating latent data and bin boundary parameters jointly, instead of individually from their respective full conditionals. When the latent data are parameters of interest, this algorithm substantially improves Gibbs sampler convergence for large datasets. We also discuss Monte Carlo Markov chain (MCMC) implementation of cumulative logit (proportional odds) and cumulative complementary log-log (proportional hazards) models with latent data.

217 citations


Journal ArticleDOI
TL;DR: An EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion is presented, initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution.
Abstract: This paper presents an EM algorithm for maximum likelihood estimation in generalized linear models with overdispersion The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully non-parametric ML estimation of this distribution This is of value because the ML estimates of the GLM parameters may be sensitive to the specification of a parametric form for the mixing distribution A listing of a GLIM4 algorithm for fitting the overdispersed binomial logit model is given in an appendix

202 citations


Journal ArticleDOI
TL;DR: Both the mean and variance are modelled using semi-parametric additive models using a successive relaxation algorithm for fitting the model, allowing flexible and interactive modelling of variance heterogeneity in a normal error model.
Abstract: This paper presents a flexible model for variance heterogeneity in a normal error model. Specifically, both the mean and variance are modelled using semi-parametric additive models. We call this model a ‘Mean And Dispersion Additive Model’ (MADAM). A successive relaxation algorithm for fitting the model is described and justified as maximizing a penalized likelihood function with penalties for lack of smoothness in the additive non-parametric functions in both mean and variance models. The algorithm is implemented in GLIM4, allowing flexible and interactive modelling of variance heterogeneity. Two data sets are used for demonstration.

97 citations


Journal ArticleDOI
TL;DR: A hierarchical Bayes model which is related to the usual empirical Bayes formulation of James-Stein estimators is analysed, and it is proved that the Gibbs sampler will fail to converge, and the associated posterior distribution is non-normalizable.
Abstract: We analyse a hierarchical Bayes model which is related to the usual empirical Bayes formulation of James-Stein estimators. We consider running a Gibbs sampler on this model. Using previous results about convergence rates of Markov chains, we provide rigorous, numerical, reasonable bounds on the running time of the Gibbs sampler, for a suitable range of prior distributions. We apply these results to baseball data from Efron and Morris (1975). For a different range of prior distributions, we prove that the Gibbs sampler will fail to converge, and use this information to prove that in this case the associated posterior distribution is non-normalizable.

77 citations


Journal ArticleDOI
TL;DR: The method introduced here represents an initial step in wavelet thresholding when coefficients are kept in the original order.
Abstract: Previous proposals in data dependent wavelet threshold selection have used only the magnitudes of the wavelet coefficients in choosing a threshold for each level. Since a jump (or other unusual feature) in the underlying function results in several non-zero coefficients which are adjacent to each other, it is possible to use change-point approaches to take advantage of the information contained in the relative position of the coefficients as well as their magnitudes. The method introduced here represents an initial step in wavelet thresholding when coefficients are kept in the original order.

72 citations


Journal ArticleDOI
TL;DR: Partitioned algorithms in effect replace the original estimation problem with a series of problems of lower dimension, and some of the circumstances under which this process of dimension reduction leads to significant benefits are characterized.
Abstract: There are a variety of methods in the literature which seek to make iterative estimation algorithms more manageable by breaking the iterations into a greater number of simpler or faster steps Those algorithms which deal at each step with a proper subset of the parameters are called in this paper partitioned algorithms Partitioned algorithms in effect replace the original estimation problem with a series of problems of lower dimension The purpose of the paper is to characterize some of the circumstances under which this process of dimension reduction leads to significant benefits

63 citations


Journal ArticleDOI
TL;DR: A faster alternative to the EM algorithm in finite mixture distributions is described, which alternates EM iterations with Gauss-Newton iterations using the observed information matrix to reduce the computing time required and provide asymptotic standard errors at convergence.
Abstract: A faster alternative to the EM algorithm in finite mixture distributions is described, which alternates EM iterations with Gauss-Newton iterations using the observed information matrix. At the expense of modest additional analytical effort in obtaining the observed information, the hybrid algorithm reduces the computing time required and provides asymptotic standard errors at convergence. The algorithm is illustrated on the two-component normal mixture.

62 citations


Journal ArticleDOI
TL;DR: An alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis is considered, which is extended to the finite mixture situation, obtained maximum likelihood estimates for the population parameters, and shown that computation is feasible for an arbitrary number of variables.
Abstract: One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.

Journal ArticleDOI
TL;DR: An approach to non-linear principal components using radially symmetric kernel basis functions is described and can be related to the homogeneity analysis approach of Gifi through the minimization of a loss function.
Abstract: An approach to non-linear principal components using radially symmetric kernel basis functions is described. The procedure consists of two steps: a projection of the data set to a reduced dimension using a non-linear transformation whose parameters are determined by the solution of a generalized symmetric eigenvector equation. This is achieved by demanding a maximum variance transformation subject to a normalization condition (Hotelling's approach) and can be related to the homogeneity analysis approach of Gifi through the minimization of a loss function. The transformed variables are the principal components whose values define contours, or more generally hypersurfaces, in the data space. The second stage of the procedure defines the fitting surface, the principal surface, in the data space (again as a weighted sum of kernel basis functions) using the definition of self-consistency of Hastie and Stuetzle. The parameters of this principal surface are determined by a singular value decomposition and crossvalidation is used to obtain the kernel bandwidths. The approach is assessed on four data sets.

Journal ArticleDOI
TL;DR: An analysis of data on immunity after Rubella vaccinations which results in a slow-mixing Gibbs sampler is presented and shows that the decline in antibodies afterRubella vaccination is relatively shallow compared to the decline which has been shown after Hepatitis B vaccination.
Abstract: Bayesian random effects models may be fitted using Gibbs sampling, but the Gibbs sampler can be slow mixing due to what might be regarded as lack of model identifiability. This slow mixing substantially increases the number of iterations required during Gibbs sampling. We present an analysis of data on immunity after Rubella vaccinations which results in a slow-mixing Gibbs sampler. We show that this problem of slow mixing can be resolved by transforming the random effects and then, if desired, expressing their joint prior distribution as a sequence of univariate conditional distributions. The resulting analysis shows that the decline in antibodies after Rubella vaccination is relatively shallow compared to the decline in antibodies which has been shown after Hepatitis B vaccination.

Journal ArticleDOI
TL;DR: Here it is shown how a full Bayesian posterior computation is made possible by novel Monte Carlo methods that approximate random increments of the posterior process.
Abstract: In the context of Bayesian non-parametric statistics, the distribution of a stochastic process serves as a prior over the class of functions indexed by its sample paths. Dykstra and Laud (1981) defined a stochastic process whose sample paths can be used to index monotone hazard rates. Although they gave a mathematical description of the corresponding posterior process, numerical evaluations of useful posterior summaries were not feasible for realistic sample sizes. Here we show how a full Bayesian posterior computation is made possible by novel Monte Carlo methods that approximate random increments of the posterior process.

Journal ArticleDOI
TL;DR: A stopping rule is provided for the backward elimination process suggested by Krzanowski (1987a) for selecting variables to preserve data structure based on perturbation theory for Procrustes statistics.
Abstract: A stopping rule is provided for the backward elimination process suggested by Krzanowski (1987a) for selecting variables to preserve data structure. The stopping rule is based on perturbation theory for Procrustes statistics, and a small simulation study verifies its suitability. Some illustrative examples are also provided and discussed.

Journal ArticleDOI
TL;DR: Two modifications to classical importance sampling are discussed—a weighted average estimate and a mixture design distribution that make importance sampling robust and allow moments to be estimated from the same bootstrap simulation used to estimate quantiles.
Abstract: Importance sampling and control variates have been used as variance reduction techniques for estimating bootstrap tail quantiles and moments, respectively. We adapt each method to apply to both quantiles and moments, and combine the methods to obtain variance reductions by factors from 4 to 30 in simulation examples.

Journal ArticleDOI
TL;DR: This work extends the concept of k principal points of a random vector X as points with respect to a loss function L, and presents an algorithm for their computation in the univariate case.
Abstract: In Flury (1990) the k principal points of a random vector X are defned as the points p(1),..., p(k) minimizing E∥X−p(i)∥2; i=1,..., k. We extend this concept to that of k principal points with respect to a loss function L, and present an algorithm for their computation in the univariate case.

Journal ArticleDOI
TL;DR: This paper presents some technical background, followed by a review of the literature that relates parallel computing to statistics, and focuses explicitly on statistical methods and applications, rather than on conventional mathematical techniques.
Abstract: Parallel computers differ from conventional serial computers in that they can, in a variety of ways, perform more than one operation at a time. Parallel processing, the application of parallel computers, has been successfully utilized in many fields of science and technology. The purpose of this paper is to review efforts to use parallel processing for statistical computing. We present some technical background, followed by a review of the literature that relates parallel computing to statistics. The review material focuses explicitly on statistical methods and applications, rather than on conventional mathematical techniques. Thus, most of the review material is drawn from statistics publications. We conclude by discussing the nature of the review material and considering some possibilities for the future.

Journal ArticleDOI
TL;DR: A 1024 CPU parallel computer is used to obtain simulated genotypes in the Tristan da Cunha pedigree using random local updating methods and a four-colour theorem is invoked to justify simultaneous updating.
Abstract: A 1024 CPU parallel computer is used to obtain simulated genotypes in the Tristan da Cunha pedigree using random local updating methods. A four-colour theorem is invoked to justify simultaneous updating. Multiple copies of the program are run simultaneously. These results are used to infer the source of the B allele of the ABO blood group that is present in the population.

Journal ArticleDOI
TL;DR: The problem of the logical implication between two hierarchical log-linear models is proved to be equivalent to the problem of deciding whether their generating classes satisfy the graphtheoretic condition of hinging.
Abstract: The problem of the logical implication between two hierarchical log-linear models is proved to be equivalent to the problem of deciding whether their generating classes satisfy the graphtheoretic condition of hinging. Moreover, a polynomial-time procedure is worked out to test implication.

Journal ArticleDOI
TL;DR: The Fisher information matrix and its inverse about the parameters of any graphical Gaussian model is derived and the method is shown to produce almost identical inference to likelihood ratio methods in the example considered.
Abstract: The comparison of an estimated parameter to its standard error, the Wald test, is a well known procedure of classical statistics. Here we discuss its application to graphical Gaussian model selection. First we derive the Fisher information matrix and its inverse about the parameters of any graphical Gaussian model. Both the covariance matrix and its inverse are considered and a comparative analysis of the asymptotic behaviour of their maximum likelihood estimators (m.l.e.s) is carried out. Then we give an example of model selection based on the standard errors. The method is shown to produce almost identical inference to likelihood ratio methods in the example considered.

Journal ArticleDOI
TL;DR: Abootstrap technique for generating pseudo-samples from survival data containing censored observations is proposed and a constrained bootstrap technique is developed in which every pseudo-sample has the same distribution of covariate values as does the original, observed data.
Abstract: We propose a bootstrap technique for generating pseudo-samples from survival data containing censored observations. This simulation selects a survival time with replacement from the data and then assigns a covariate according to the model of proportional hazards. We also develop a constrained bootstrap technique in which every pseudo-sample has the same distribution of covariate values as does the original, observed data. We use these simulation techniques to estimate the bias and variance of regression coefficients and to approximate the significance levels of goodness-of-fit statistics for testing the assumption of the proportional hazards model.

Journal ArticleDOI
TL;DR: A new test for the presence of a normal mixture distribution, based on the posterior Bayes factor of Aitkin (1991), which does not require the computation of the MLEs of the parameters or a search for multiple maxima, but requires computations based on ‘classification likelihood’ assignments of observations to mixture components.
Abstract: We present a new test for the presence of a normal mixture distribution, based on the posterior Bayes factor of Aitkin (1991). The new test has slightly lower power than the likelihood ratio test. It does not require the computation of the MLEs of the parameters or a search for multiple maxima, but requires computations based on ‘classification likelihood’ assignments of observations to mixture components.

Journal ArticleDOI
TL;DR: A bootstrap Monte Carlo algorithm for computing the power function of the generalized correlation coefficient and Monte Carlo experiments indicate that the proposed algorithm is reliable and compares well with the asymptotic values.
Abstract: We present a bootstrap Monte Carlo algorithm for computing the power function of the generalized correlation coefficient. The proposed method makes no assumptions about the form of the underlying probability distribution and may be used with observed data to approximate the power function and pilot data for sample size determination. In particular, the bootstrap power functions of the Pearson product moment correlation and the Spearman rank correlation are examined. Monte Carlo experiments indicate that the proposed algorithm is reliable and compares well with the asymptotic values. An example which demonstrates how this method can be used for sample size determination and power calculations is provided.

Journal ArticleDOI
TL;DR: This paper implements non-binary splits into Gelfand et al.'s modification of the CART algorithm to better reflect the structure of data than binary splits.
Abstract: The main objective of this paper is to implement non-binary splits into Gelfand et al.'s modification of the CART algorithm. Multi-way splits can sometimes better reflect the structure of data than binary splits. Methods for the construction and comparison of such splits are described.

Journal ArticleDOI
TL;DR: This paper focuses particularly on the metadata model devised to harmonize semantically related data sources as well as the table model providing the principal data structure of the proposed system.
Abstract: Concerning the task of integrating census and survey data from different sources as it is carried out by supranational statistical agencies, a formal metadata approach is investigated which supports data integration and table processing simultaneously. To this end, a metadata model is devised such that statistical query processing is accomplished by means of symbolic reasoning on machine-readable, operative metadata. As in databases, statistical queries are stated as formal expressions specifying declaratively what the intended output is; the operations necessary to retrieve appropriate available source data and to aggregate source data into the requested macrodata are derived mechanically. Using simple mathematics, this paper focuses particularly on the metadata model devised to harmonize semantically related data sources as well as the table model providing the principal data structure of the proposed system. Only an outline of the general design of a statistical information system based on the proposed metadata model is given and the state of development is summarized briefly.

Journal ArticleDOI
TL;DR: Modified explicit inversive congruential pseudorandom numbers have some attractive properties at least regarding their two-dimensional discrepancy, and lower and upper bounds for the discrepancy of the corresponding two- dimensional point sets are established.
Abstract: A modified version of the explicit inversive congruential method with power of 2 modulus for generating uniform pseudorandom numbers is introduced. The statistical independence behaviour of the generated sequences is studied based on the distribution of all pairs of successive pseudorandom numbers over the entire period. Lower and upper bounds for the discrepancy of the corresponding two-dimensional point sets are established. These results certainly play only a minor part in studying the statistical independence behaviour of the generated sequences, but they show that modified explicit inversive congruential pseudorandom numbers have some attractive properties at least regarding their two-dimensional discrepancy. The method of proof relies heavily on a thorough analysis of certain exponential sums.

Journal ArticleDOI
TL;DR: This article focuses on reducing the computation for the determinantal ratio measures by making use of upper and lower bounds on the influence to limit the number of subsets for which the actual influence must be explicitly determined.
Abstract: One of the important goals of regression diagnostics is the detection of cases or groups of cases which have an inordinate impact on the regression results. Such observations are generally described as ‘influential’. A number of influence measures have been proposed, each focusing on a different aspect of the regression. For single cases, these measures are relatively simple and inexpensive to calculate. However, the detection of multiple-case or joint influence is more difficult on two counts. First, calculation of influence for a single subset is more involved than for an individual case, and second, the sheer number of subsets of cases makes the computation overwhelming for all but the smallest data sets.

Journal ArticleDOI
TL;DR: A computationally efficient method to determine the interaction structure in a multidimensional binary sample using an interaction model based on orthogonal functions, and a result on independence properties in this model is given.
Abstract: We develop a computationally efficient method to determine the interaction structure in a multidimensional binary sample. We use an interaction model based on orthogonal functions, and give a result on independence properties in this model. Using this result we develop an efficient approximation algorithm for estimating the parameters in a given undirected model. To find the best model, we use a heuristic search algorithm in which the structure is determined incrementally. We also give an algorithm for reconstructing the causal directions, if such exist. We demonstrate that together these algorithms are capable of discovering almost all of the true structure for a problem with 121 variables, including many of the directions.

Journal ArticleDOI
TL;DR: This work focuses on embedded integration rules which are an attractive numerical integration tool and present theoretical results which justify their use in a Bayesian integration strategy.
Abstract: Numerical approximations are often used to implement the Bayesian paradigm in analytically intractable parametric models. We focus on embedded integration rules which are an attractive numerical integration tool and present theoretical results which justify their use in a Bayesian integration strategy.