scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1994"


Journal ArticleDOI
TL;DR: In this paper, a new class of semiparametric estimators, based on inverse probability weighted estimating equations, were proposed for parameter vector α 0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled.
Abstract: In applied problems it is common to specify a model for the conditional mean of a response given a set of regressors. A subset of the regressors may be missing for some study subjects either by design or happenstance. In this article we propose a new class of semiparametric estimators, based on inverse probability weighted estimating equations, that are consistent for parameter vector α0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled. We show that the asymptotic variance of the optimal estimator in our class attains the semiparametric variance bound for the model by first showing that our estimation problem is a special case of the general problem of parameter estimation in an arbitrary semiparametric model in which the data are missing at random and the probability of observing complete data is bounded away from 0, and then deriving a representation for the efficient score...

2,638 citations


Journal ArticleDOI
TL;DR: In this paper, the stationary bootstrap technique was introduced to calculate standard errors of estimators and construct confidence regions for parameters based on weakly dependent stationary observations, where m is fixed.
Abstract: This article introduces a resampling procedure called the stationary bootstrap as a means of calculating standard errors of estimators and constructing confidence regions for parameters based on weakly dependent stationary observations. Previously, a technique based on resampling blocks of consecutive observations was introduced to construct confidence intervals for a parameter of the m-dimensional joint distribution of m consecutive observations, where m is fixed. This procedure has been generalized by constructing a “blocks of blocks” resampling scheme that yields asymptotically valid procedures even for a multivariate parameter of the whole (i.e., infinite-dimensional) joint distribution of the stationary sequence of observations. These methods share the construction of resampling blocks of observations to form a pseudo-time series, so that the statistic of interest may be recalculated based on the resampled data set. But in the context of applying this method to stationary data, it is natural...

2,418 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the application of two families of nonlinear autoregressive models, the logistic (LSTAR) and exponential (ESTAR) models, and consider the specification of the model based on simple statistical tests: linearity testing against smooth transition autoregression, determining the delay parameter and choosing between LSTAR and ESTAR models.
Abstract: This article considers the application of two families of nonlinear autoregressive models, the logistic (LSTAR) and exponential (ESTAR) autoregressive models. This includes the specification of the model based on simple statistical tests: linearity testing against smooth transition autoregression, determining the delay parameter and choosing between LSTAR and ESTAR models are discussed. Estimation by nonlinear least squares is considered as well as evaluating the properties of the estimated model. The proposed techniques are illustrated by examples using both simulated and real time series.

1,883 citations


Journal ArticleDOI
TL;DR: Davidson and MacKinnon as discussed by the authors have written an outstanding textbook for graduates in econometrics, covering both basic and advanced topics and using geometrical proofs throughout for clarity of exposition.
Abstract: Davidson and MacKinnon have written an outstanding textbook for graduates in econometrics, covering both basic and advanced topics and using geometrical proofs throughout for clarity of exposition. The book offers a unified theoretical perspective, and emphasizes the practical applications of modern theory.

1,512 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications, and propose a panacea by the standard Bayesian formalism that averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities.
Abstract: We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism that averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximizing predictive ability. But this has not been used in practice, because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1...

1,313 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed an iterated cumulative sum of squares (ICSS) algorithm to detect variance changes in a sequence of independent observations, and compared the results of the ICSS algorithm to those obtained by a Bayesian approach or by likelihood ratio tests.
Abstract: This article studies the problem of multiple change points in the variance of a sequence of independent observations. We propose a procedure to detect variance changes based on an iterated cumulative sums of squares (ICSS) algorithm. We study the properties of the centered cumulative sum of squares function and give an intuitive basis for the ICSS algorithm. For series of moderate size (i.e., 200 observations and beyond), the ICSS algorithm offers results comparable to those obtained by a Bayesian approach or by likelihood ratio tests, without the heavy computational burden required by these approaches. Simulation results comparing the ICSS algorithm to other approaches are presented.

1,261 citations


Journal ArticleDOI
TL;DR: This article introduces an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights, and in many applications this new procedure works very well without the need for iterations.
Abstract: For missing data problems, Tanner and Wong have described a data augmentation procedure that approximates the actual posterior distribution of the parameter vector by a mixture of complete data posteriors. Their method of constructing the complete data sets is closely related to the Gibbs sampler. Both required iterations, and, similar to the EM algorithm, convergence can be slow. We introduce in this article an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights. In many applications this new procedure works very well without the need for iterations. Sensitivity analysis, influence analysis, and updating with new data can be performed cheaply. Bayesian prediction and model selection can also be incorporated. Examples taken from a wide range of applications are used for illustration.

1,166 citations



Journal ArticleDOI
TL;DR: In this article, the authors propose tests for unit root and other forms of nonstationarity that are asymptotically locally most powerful against a certain class of alternatives and have the same critical values given by the chi-squared distribution.
Abstract: This article proposes tests for unit root and other forms of nonstationarity that are asymptotically locally most powerful against a certain class of alternatives and have asymptotic critical values given by the chi-squared distribution. Many existing unit root tests do not share these properties. The alternatives include fractionally and seasonally fractionally differenced processes. There is considerable flexibility in our choice of null hypothesis, which can entail one or more integer or fractional roots of arbitrary order anywhere on the unit circle in the complex plane. For example, we can test for a fractional degree of integration of order 1/2; this can be interpreted as a test for nonstationarity against stationarity. “Overdifferencing” stationary null hypotheses can also be tested. The test statistic is derived via the score principle and is conveniently expressed in the frequency domain. The series tested are regression errors, which, when the hypothesized differencing is correct, are w...

892 citations


Journal ArticleDOI
TL;DR: In this article, a simulation-based method of inference for parametric measurement error models in which the measurement error variance is known or at least well estimated is described, and the method entails adding add...
Abstract: We describe a simulation-based method of inference for parametric measurement error models in which the measurement error variance is known or at least well estimated. The method entails adding add...

724 citations


Journal ArticleDOI
TL;DR: Nonparametric versions of discriminant analysis are obtained by replacing linear regression by any nonparametric regression method so that any multiresponse regression technique can be postprocessed to improve its classification performance.
Abstract: Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can find a reduced number of discriminant coordinate functions that are “optimal” for separating the groups. With two such functions, one can produce a classification map that partitions the reduced space into regions that are identified with group membership, and the decision boundaries are linear. This article is about richer nonlinear classification schemes. Linear discriminant analysis is equivalent to multiresponse linear regression using optimal scorings to represent the groups. In this paper, we obtain nonparametric versions of discriminant analysis by replacing linear regression by any nonparametric regression method. In this way, any multiresponse regression technique (such as MARS or neural networks) can be postprocessed to improve its classification performance.

Journal ArticleDOI
Jun Liu1
TL;DR: In this paper, the Gibbs sampler is used to detect common binding sites in unaligned DNA sequences, and a procedure of calculating the posterior odds ratio via the collapsed Gibbs sampling when incomplete observations are involved is presented.
Abstract: This article describes a method of “grouping” and “collapsing” in using the Gibbs sampler and proves from an operator theory viewpoint that the method is in general beneficial. The norms of the forward operators associated with the corresponding nonreversible Markov chains are used to discriminate among different simulation schemes. When applied to Bayesian missing data problems, the idea of collapsing suggests skipping the steps of sampling parameter(s) values in standard data augmentation. By doing this, we obtain a predictive update version of the Gibbs sampler. A procedure of calculating the posterior odds ratio via the collapsed Gibbs sampler when incomplete observations are involved is presented. As an illustration of possible applications, three examples, along with a Bayesian treatment for identifying common protein binding sites in unaligned DNA sequences, are provided.

Journal ArticleDOI
TL;DR: In this paper, the Gibbs sampler algorithm was used to compare parametric empirical Bayes estimators (PEB) and NPEB estimators in a Monte Carlo study.
Abstract: In this article, the Dirichlet process prior is used to provide a nonparametric Bayesian estimate of a vector of normal means. In the past there have been computational difficulties with this model. This article solves the computational difficulties by developing a “Gibbs sampler” algorithm. The estimator developed in this article is then compared to parametric empirical Bayes estimators (PEB) and nonparametric empirical Bayes estimators (NPEB) in a Monte Carlo study. The Monte Carlo study demonstrates that in some conditions the PEB is better than the NPEB and in other conditions the NPEB is better than the PEB. The Monte Carlo study also shows that the estimator developed in this article produces estimates that are about as good as the PEB when the PEB is better and produces estimates that are as good as the NPEB estimator when that method is better.

Journal ArticleDOI
TL;DR: Univariate partial least squares (PLS) as mentioned in this paper is a method of modeling relationships between a Y variable and other explanatory variables, and it can be used with any number of explanatory variables.
Abstract: Univariate partial least squares (PLS) is a method of modeling relationships between a Y variable and other explanatory variables. It may be used with any number of explanatory variables, even far more than the number of observations. A simple interpretation is given that shows the method to be a straightforward and reasonable way of forming prediction equations. Its relationship to multivariate PLS, in which there are two or more Y variables, is examined, and an example is given in which it is compared by simulation with other methods of forming prediction equations. With univariate PLS, linear combinations of the explanatory variables are formed sequentially and related to Y by ordinary least squares regression. It is shown that these linear combinations, here called components, may be viewed as weighted averages of predictors, where each predictor holds the residual information in an explanatory variable that is not contained in earlier components, and the quantity to be predicted is the vecto...


Journal ArticleDOI
TL;DR: Three main topics are discussed: bootstrap methods for missing data, these methods' relationship to the theory of multiple imputation, and computationally efficient ways of executing them.
Abstract: Missing data refers to a class of problems made difficult by the absence of some portions of a familiar data structure. For example, a regression problem might have some missing values in the predi...

Journal ArticleDOI
TL;DR: In this article, a unified approach for the distribution theory of runs based on a finite Markov chain imbedding technique is presented, which covers both identical and non-identical Bernoulli trials.
Abstract: The statistics of the number of success runs in a sequence of Bernoulli trials have been used in many statistical areas. For almost a century, even in the simplest case of independent and identically distributed Bernoulli trials, the exact distributions of many run statistics still remain unknown. Departing from the traditional combinatorial approach, in this article we present a simple unified approach for the distribution theory of runs based on a finite Markov chain imbedding technique. Our results cover not only the identical Bernoulli trials, but also the nonidentical Bernoulli trials. As a byproduct, our results also yield the exact distribution of the waiting time for the mth occurrence of a specific run.

Journal ArticleDOI
TL;DR: In this paper, a method for estimating the parameter β, γ of this type of semiparametric model using a quasi-likelihood function is presented and the asymptotic distribution theory for the estimators is developed.
Abstract: Suppose the expected value of a response variable Y may be written h(Xβ +γ(T)) where X and T are covariates, each of which may be vector-valued, β is an unknown parameter vector, γ is an unknown smooth function, and h is a known function. In this article, we outline a method for estimating the parameter β, γ of this type of semiparametric model using a quasi-likelihood function. Algorithms for computing the estimates are given and the asymptotic distribution theory for the estimators is developed. The generalization of this approach to the case in which Y is a multivariate response is also considered. The methodology is illustrated on two data sets and the results of a small Monte Carlo study are presented.

Journal ArticleDOI
Michael Friendly1
TL;DR: Extensions of the mosaic display to highlight patterns of deviations from various models for categorical data are discussed, and the use of color and shading to represent sign and magnitude of standardized residuals from a specified model is introduced.
Abstract: Mosaic displays represent the counts in a contingency table by tiles whose size is proportional to the cell count. This graphical display for categorical data generalizes readily to multi-way tables. This article discusses extensions of the mosaic display to highlight patterns of deviations from various models for categorical data. First, we introduce the use of color and shading to represent sign and magnitude of standardized residuals from a specified model. For unordered categorical variables, we show how the perception of patterns of association can be enhanced by reordering the categories. Second, we introduce sequential mosaics of marginal subtables, together with sequential models for these tables. For a class of sequential models of joint independence, the individual mosaics provide a graphic representation of a partition of the overall likelihood ratio G2 for complete independence in the full table into portions attributable to hypotheses about the marginal subtables.

Journal ArticleDOI
TL;DR: In this paper, a modification of the formal definition of a p value was proposed, which restricted the maximization to a confidence set for the nuisance parameter, and gave various examples to show how this new method gave improved results for 2 × 2 tabl...
Abstract: For testing problems of the form H 0: v = v 0 with unknown nuisance parameter θ, various methods are used to deal with θ. The simplest approach is exemplified by the t test where the unknown variance is replaced by the sample variance and the t distribution accounts for estimation of the variance. In other problems, such as the 2 × 2 contingency table, one conditions on a sufficient statistic for 0 and proceeds as in Fisher's exact test. Because neither of these standard methods is appropriate for all situations, this article suggests a new method for handling the unknown θ. This new method is a simple modification of the formal definition of a p value that involves taking a maximum over the nuisance parameter space of a p value obtained for the case when θ is known. The suggested modification is to restrict the maximization to a confidence set for the nuisance parameter. After giving a brief justification, we give various examples to show how this new method gives improved results for 2 × 2 tabl...


Journal ArticleDOI
TL;DR: The Logic of Maximum Likelihood (LML) as discussed by the authors is a general modeling framework using maximum likelihood methods for estimating the probability of a given model with a given set of parameters.
Abstract: Introduction The Logic of Maximum Likelihood A General Modeling Framework Using Maximum Likelihood Methods An Introduction to Basic Estimation Techniques Further Empirical Examples Additional Likelihoods Conclusions

Journal ArticleDOI
TL;DR: In this paper, a random field model for the mean temperature over the region in the northern United States covering eastern Montana through the Dakotas and northern Nebraska up to the Canadian border is developed.
Abstract: In this article we develop a random field model for the mean temperature over the region in the northern United States covering eastern Montana through the Dakotas and northern Nebraska up to the Canadian border. The readings are temperatures at the stations in the U.S. historical climatological network. The stochastic structure is modeled by a stationary spatial-temporal Gaussian random field. For this region, we find little evidence of temporal dependence while the spatial structure is temporally stable. The approach strives to incorporate the uncertainty in estimating the covariance structure into the predictive distributions and the final inference. As an application of the model, we derive posterior distributions of the areal mean over time. A posterior distribution for the static areal mean is presented as a basis for calibrating temperature shifts by the historical record. For this region and season, the distribution indicates that under the scenario of a gradual increase of 5°F over 50 ye...

Journal ArticleDOI
Art B. Owen1
TL;DR: A method is proposed that appears to produce correlations of order Op (n −3/2) under Latin hypercube sampling with n integrand evaluations and an analysis of the algorithm indicates that it cannot be expected to do better than n −2/2.
Abstract: Monte Carlo integration is competitive for high-dimensional integrands. Latin hypercube sampling is a stratification technique that reduces the variance of the integral. Previous work has shown that the additive part of the integrand is integrated with error Op (n −1/2) under Latin hypercube sampling with n integrand evaluations. A bilinear part of the integrand is more accurately estimated if the sample correlations among input variables are negligible. Other authors have proposed an algorithm for controlling these correlations. We show that their method reduces the correlations by roughly a factor of 3 for 10 ≤ n ≤ 500. We propose a method that, based on simulations, appears to produce correlations of order Op (n −3/2). An analysis of the algorithm indicates that it cannot be expected to do better than n −3/2.

Journal ArticleDOI
TL;DR: In this paper, an extension of the bivariate model suggested by Dale is proposed for the analysis of dependent ordinal categorical data, which is constructed by first generalizing the Bivariate Plackett distribution to any dimensions.
Abstract: An extension of the bivariate model suggested by Dale is proposed for the analysis of dependent ordinal categorical data. The so-called multivariate Dale model is constructed by first generalizing the bivariate Plackett distribution to any dimensions. Because the approach is likelihood based, it satisfies properties that are not fulfilled by other popular methods, such as the generalized estimating equations approach. The proposed method models both the marginal and the association structure in a flexible way. The attractiveness of the multivariate Dale model is illustrated in three key examples, covering areas such as crossover trials, longitudinal studies with patients dropping out from the study, and discriminant analysis applications. The differences and similarities with the generalized estimating approach are highlighted.

Journal ArticleDOI
TL;DR: The debate is turned to where it ultimately matters—namely, the precision of prediction based on real data.
Abstract: In disciplines such as soil science, ecology, meteorology, water resources, mining engineering, and forestry, spatial prediction is of central interest. A sparsely sampled spatial process yields imperfect knowledge of a resource, from which prediction of unobserved parts of the process are to be made. A popular stochastic method that solves this problem is kriging. But the appropriateness of kriging—and, for that matter, of any method based on probabilistic models for spatial data—has been frequently questioned. A number of nonstochastic methods have also been proposed, the leading contender of which appears to be splines. There has been some debate as to which of kriging and splines is better—a debate that has centered largely on operational issues, because the two methods are based on different models for the process. In this article the debate is turned to where it ultimately matters—namely, the precision of prediction based on real data. By dividing data sets into modeling sets and prediction...

Journal ArticleDOI
TL;DR: A new method for making stochastic population forecasts that provide consistent probability intervals is presented and implements, which combines mathematical demography and statistical time series methods with the theory of random-matrix products to forecast various demographic measures and their associated probability intervals to 2065.
Abstract: Conventional population projections use “high,” “medium,” and “low” scenarios to indicate uncertainty, but probability interpretations are rarely given, and in any event the resulting ranges for vital rates, births, deaths, age groups sizes, age ratios, and population size cannot possibly be probabilistically consistent with one another. This article presents and implements a new method for making stochastic population forecasts that provide consistent probability intervals. We blend mathematical demography and statistical time series methods to estimate stochastic models of fertility and mortality based on U.S. data back to 1900 and then use the theory of random-matrix products to forecast various demographic measures and their associated probability intervals to the year 2065. Our expected total population sizes agree quite closely with the Census medium projections, and our 95 percent probability intervals are close to the Census high and low scenarios. But Census intervals in 2065 for ages 65...

Journal ArticleDOI
TL;DR: A few repeats of a simple forward search from a random starting point are shown to provide sufficiently robust parameter estimates to reveal masked multiple outliers and stability of the patterns obtained is exhibited by the stalactite plot.
Abstract: A few repeats of a simple forward search from a random starting point are shown to provide sufficiently robust parameter estimates to reveal masked multiple outliers. The stability of the patterns obtained is exhibited by the stalactite plot. The robust estimators used are least median of squares for regression and the minimum volume ellipsoid for multivariate outliers. The forward search also has potential as an algorithm for calculation of these parameter estimates. For large problems, parallel computing provides appreciable reduction in computational time.

Journal ArticleDOI
Philip E. Cheng1
TL;DR: In this article, a distribution-free estimation procedure for a basic pattern of missing data that often arises from the wellknown double sampling in survey methodology is considered, where kernel regression estimators are used to estimate mean functionals through empirical estimation of the missing pattern.
Abstract: This article considers a distribution-free estimation procedure for a basic pattern of missing data that often arises from the wellknown double sampling in survey methodology. Without parametric modeling of the missing mechanism or the joint distribution, kernel regression estimators are used to estimate mean functionals through empirical estimation of the missing pattern. A generalization of the method of Cheng and Wei is verified under the assumption of missing at random. Asymptotic distributions are derived for estimating the mean of the incomplete data and for estimating the mean treatment difference in a nonrandomized observational study. The nonparametric method is compared with a naive pairwise deletion method and a linear regression method via the asymptotic relative efficiencies and a simulation study. The comparison shows that the proposed nonparametric estimators attain reliable performances in general.