scispace - formally typeset
Search or ask a question

Showing papers by "Donald B. Rubin published in 1992"


Journal ArticleDOI
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Abstract: The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.

13,884 citations


Journal ArticleDOI
TL;DR: In this paper, the authors provide simple but accurate methods for comparing correlation coefficients between a dependent variable and a set of independent variables using the Fisher z transformation and include a test and confidence interval for comparing two correlated correlations, a test for heterogeneity, and a contrast among k ≥ 2 correlated correlations.
Abstract: The purpose of this article is to provide simple but accurate methods for comparing correlation coefficients between a dependent variable and a set of independent variables. The methods are simple extensions of Dunn & Clark's (1969) work using the Fisher z transformation and include a test and confidence interval for comparing two correlated correlations, a test for heterogeneity, and a test and confidence interval for a contrast among k (>2) correlated correlations. Also briefly discussed is why the traditional Hotelling's t test for comparing correlated correlations is generally not appropriate in practice

2,300 citations


Journal ArticleDOI
TL;DR: In this paper, a complete-data log likelihood ratio based procedure was proposed to obtain significance levels from multiply-imputed data, which does not require access to the completed data point estimates and variance-covariance matrices.
Abstract: SUMMARY Existing procedures for obtaining significance levels from multiply-imputed data either (i) require access to the completed-data point estimates and variance-covariance matrices, which may not be available in practice when the dimensionality of the estimand is high, or (ii) directly combine p-values with less satisfactory results. Taking advantage of the well-known relationship between the Wald and log likelihood ratio test statistics, we propose a complete-data log likelihood ratio based procedure. It is shown that, for any number of multiple imputations, the proposed procedure is equivalent in large samples to the existing procedure based on the point estimates and the variance-covariance matrices, yet it only requires the point estimates and evaluations of the complete-data log likelihood ratio statistic as a function of these estimates and the completed data. The proposed procedure, therefore, is especially attractive with highly multiparameter incomplete-data problems since it does not involve the computation of any matrices.

323 citations


Journal ArticleDOI
TL;DR: This paper showed that matching on estimated rather than population propensity scores can lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible.
Abstract: SUMMARY Matched sampling is a standard technique for controlling bias in observational studies due to specific covariates. Since Rosenbaum & Rubin (1983), multivariate matching methods based on estimated propensity scores have been used with increasing frequency in medical, educational, and sociological applications. We obtain analytic expressions for the effect of matching using linear propensity score methods with normal distributions. These expressions cover cases where the propensity score is either known, or estimated using either discriminant analysis or logistic regression, as is typically done in current practice. The results show that matching using estimated propensity scores not only reduces bias along the population propensity score, but also controls variation of components orthogonal to it. Matching on estimated rather than population propensity scores can therefore lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible. Approximations are given for the magnitude of this variance reduction, which can be computed using estimates obtained from the matching pools. Related expressions for bias reduction are also presented which suggest that, in difficult matching situations, the use of population scores leads to greater bias reduction than the use of estimated scores.

197 citations


Journal ArticleDOI
TL;DR: In this article, a general theoretical framework for studying the performance of matching methods with ellipsoidal distributions is presented, which decomposes the effects of matching into one subspace containing the best linear discriminant, and the subspace of variables uncorrelated with the discriminant.
Abstract: Matched sampling is a common technique used for controlling bias in observational studies. We present a general theoretical framework for studying the performance of such matching methods. Specifically, results are obtained concerning the performance of affinely invariant matching methods with ellipsoidal distributions, which extend previous results on equal percent bias reducing methods. Additional extensions cover conditionally affinely invariant matching methods for covariates with conditionally ellipsoidal distributions. These results decompose the effects of matching into one subspace containing the best linear discriminant, and the subspace of variables uncorrelated with the discriminant. This characterization of the effects of matching provides a theoretical foundation for understanding the performance of specific methods such as matched sampling using estimated propensity scores. Calculations for such methods are given in subsequent articles.

139 citations


Journal ArticleDOI
TL;DR: In contrast to these average effect sizes of literature synthesis, the proper estimand is an effect-size surface, which is a function only of scientifically relevant factors, and which can only be estimated by extrapolating a response surface of observed effect sizes to a region of ideal studies.
Abstract: A traditional meta-analysis can be thought of as a literature synthesis, in which a collection of observed studies is analyzed to obtain summary judgments about overall significance and size of effects. Many aspects of the current set of statistical tools for meta-analysis are highly useful—for example, the development of clear and concise effect-size indicators with associated standard errors. I am less happy, however, with more esoteric statistical techniques and their implied objects of estimation (i.e., their estimands) which are tied to the conceptualization of average effect sizes, weighted or otherwise, in a population of studies. In contrast to these average effect sizes of literature synthesis, I believe that the proper estimand is an effect-size surface, which is a function only of scientifically relevant factors, and which can only be estimated by extrapolating a response surface of observed effect sizes to a region of ideal studies. This effect-size surface perspective is presented and contras...

75 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed and evaluated two methods of reweighting preliminary data to obtain estimates more closely approximating those derived from the final data set, and demonstrated the value of propensity modeling, a general-purpose methodology that can be applied to a wide range of problems including adjustment for unit nonresponse and frame undercoverage as well as statistical matching.
Abstract: This article proposes and evaluates two new methods of reweighting preliminary data to obtain estimates more closely approximating those derived from the final data set. In our motivating example, the preliminary data are an early sample of tax returns, and the final data set is the sample after all tax returns have been processed. The new methods estimate a predicted propensity for late filing for each return in the advance sample and then poststratify based on these propensity scores. Using advance and complete sample data for 1982, we demonstrate that the new methods produce advance estimates generally much closer to the final estimates than those derived from the current advance estimation techniques. The results demonstrate the value of propensity modeling, a general-purpose methodology that can be applied to a wide range of problems, including adjustment for unit nonresponse and frame undercoverage as well as statistical matching.

67 citations



Journal ArticleDOI
TL;DR: This review attempts to provide an introduction to some of newer computational methods that can be applied to draw inferences with random effects and longitudinal models by describing them as extensions of the EM algorithm, currently a standard tool for the analysis of longitudinal and random effects models.
Abstract: Random effects and longitudinal models are becoming increasingly popular in the analysis of many types of data, including medical and biopharmaceutical, because of their richness and flexibility. They can be, however, difficult to fit using traditional statistical tools. Fortunately, there now exists a burgeoning collection of newer computational methods that can be applied to draw inferences with such models. This review attempts to provide an introduction to some of these techniques by describing them as extensions of the EM algorithm, currently a standard tool for the analysis of longitudinal and random effects models. For clarity of exposition, the extensions are classified into three types: large-sample iterative; large-sample simulation, and small-sample simulation.

12 citations


Journal ArticleDOI
TL;DR: In this paper, the convergence of MCMC samples is discussed and a complete list of the commentaries on these articles is shown on the next page, along with a list of commentaries for each article.
Abstract: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science. This article is from a volume of Statistical Science (1992; 7(4)) on the convergence of MCMC samples. The two main articles are: A complete list of the commentaries on these articles is shown on the next page.

10 citations