scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Multiple Imputation After 18+ Years

01 Jun 1996-Journal of the American Statistical Association (American Statistical Association)-Vol. 91, Iss: 434, pp 473-489
TL;DR: A description of the assumed context and objectives of multiple imputation is provided, and a review of the multiple imputations framework and its standard results are reviewed.
Abstract: Multiple imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation and objective, I believe that multiple imputation by the data-base constructor is the method of choice. This article first provides a description of the assumed context and objectives, and second, reviews the multiple imputation framework and its standard results. These preliminary discussions are especially important because some recent commentaries on multiple imputation have reflected either misunderstandings of the practical objectives of multiple imputation or misunderstandings of fundamental theoretical results. Then, criticisms of multiple imputation are considered, and, finally, comparisons are made to alt...

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.
Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

10,568 citations


Cites background from "Multiple Imputation After 18+ Years..."

  • ...Reviews of MI have been published by Rubin (1996) and Schafer (1997, 1999a)....

    [...]

  • ...Properties of MI when the imputer’s and analyst’s models differ have been explored theoretically by Meng (1994) and Rubin (1996) and from a practical standpoint by Collins, Schafer, and Kam (2001)....

    [...]

Journal ArticleDOI
TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.
Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

10,234 citations


Cites background or methods from "Multiple Imputation After 18+ Years..."

  • ...Multiple imputation (Rubin 1987, 1996) is the method of choice for complex incomplete data problems....

    [...]

  • ...Theory and applications of multiple imputation have been developed in Rubin (1987) and Rubin (1996). van Buuren (2012) introduces multiple imputation from an applied perspective....

    [...]

Book
21 Mar 2002
TL;DR: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data is as discussed by the authors, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced.
Abstract: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

9,509 citations

Journal ArticleDOI
TL;DR: The principles of the method and how to impute categorical and quantitative variables, including skewed variables, are described and shown and the practical analysis of multiply imputed data is described, including model building and model checking.
Abstract: Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments. Copyright © 2010 John Wiley & Sons, Ltd.

6,349 citations

Journal ArticleDOI
TL;DR: Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.
Abstract: In recent years, multiple imputation has emerged as a convenient and flexible paradigm for analysing data with missing values. Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.

3,387 citations

References
More filters
Book
01 Jan 1993
TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.
Abstract: This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.

37,183 citations

Book
01 Jan 1987
TL;DR: This work states that maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse and large-Sample Inference Based on Maximum Likelihood Estimates is likely to be high.
Abstract: Preface.PART I: OVERVIEW AND BASIC APPROACHES.Introduction.Missing Data in Experiments.Complete-Case and Available-Case Analysis, Including Weighting Methods.Single Imputation Methods.Estimation of Imputation Uncertainty.PART II: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA.Theory of Inference Based on the Likelihood Function.Methods Based on Factoring the Likelihood, Ignoring the Missing-Data Mechanism.Maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse.Large-Sample Inference Based on Maximum Likelihood Estimates.Bayes and Multiple Imputation.PART III: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA: APPLICATIONS TO SOME COMMON MODELS.Multivariate Normal Examples, Ignoring the Missing-Data Mechanism.Models for Robust Estimation.Models for Partially Classified Contingency Tables, Ignoring the Missing-Data Mechanism.Mixed Normal and Nonnormal Data with Missing Values, Ignoring the Missing-Data Mechanism.Nonignorable Missing-Data Models.References.Author Index.Subject Index.

18,201 citations


"Multiple Imputation After 18+ Years..." refers methods in this paper

  • ...Certain ad hoc methods of handling missing data, such as "complete-case analysis," "available-case analysis," and "fill-in with means" (e.g., see Little and Rubin 1987, part I), satisfy this basic objective and so have a certain appeal....

    [...]

Book
01 Jan 1995
TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

16,079 citations

Book
01 Jan 1987
TL;DR: In this article, a survey of drinking behavior among men of retirement age was conducted and the results showed that the majority of the participants reported that they did not receive any benefits from the Social Security Administration.
Abstract: Tables and Figures. Glossary. 1. Introduction. 1.1 Overview. 1.2 Examples of Surveys with Nonresponse. 1.3 Properly Handling Nonresponse. 1.4 Single Imputation. 1.5 Multiple Imputation. 1.6 Numerical Example Using Multiple Imputation. 1.7 Guidance for the Reader. 2. Statistical Background. 2.1 Introduction. 2.2 Variables in the Finite Population. 2.3 Probability Distributions and Related Calculations. 2.4 Probability Specifications for Indicator Variables. 2.5 Probability Specifications for (X,Y). 2.6 Bayesian Inference for a Population Quality. 2.7 Interval Estimation. 2.8 Bayesian Procedures for Constructing Interval Estimates, Including Significance Levels and Point Estimates. 2.9 Evaluating the Performance of Procedures. 2.10 Similarity of Bayesian and Randomization--Based Inferences in Many Practical Cases. 3. Underlying Bayesian Theory. 3.1 Introduction and Summary of Repeated--Imputation Inferences. 3.2 Key Results for Analysis When the Multiple Imputations are Repeated Draws from the Posterior Distribution of the Missing Values. 3.3 Inference for Scalar Estimands from a Modest Number of Repeated Completed--Data Means and Variances. 3.4 Significance Levels for Multicomponent Estimands from a Modest Number of Repeated Completed--Data Means and Variance--Covariance Matrices. 3.5 Significance Levels from Repeated Completed--Data Significance Levels. 3.6 Relating the Completed--Data and Completed--Data Posterior Distributions When the Sampling Mechanism is Ignorable. 4. Randomization--Based Evaluations. 4.1 Introduction. 4.2 General Conditions for the Randomization--Validity of Infinite--m Repeated--Imputation Inferences. 4.3Examples of Proper and Improper Imputation Methods in a Simple Case with Ignorable Nonresponse. 4.4 Further Discussion of Proper Imputation Methods. 4.5 The Asymptotic Distibution of (Qm,Um,Bm) for Proper Imputation Methods. 4.6 Evaluations of Finite--m Inferences with Scalar Estimands. 4.7 Evaluation of Significance Levels from the Moment--Based Statistics Dm and Dm with Multicomponent Estimands. 4.8 Evaluation of Significance Levels Based on Repeated Significance Levels. 5. Procedures with Ignorable Nonresponse. 5.1 Introduction. 5.2 Creating Imputed Values under an Explicit Model. 5.3 Some Explicit Imputation Models with Univariate YI and Covariates. 5.4 Monotone Patterns of Missingness in Multivariate YI. 5.5 Missing Social Security Benefits in the Current Population Survey. 5.6 Beyond Monotone Missingness. 6. Procedures with Nonignorable Nonresponse. 6.1 Introduction. 6.2 Nonignorable Nonresponse with Univariate YI and No XI. 6.3 Formal Tasks with Nonignorable Nonresponse. 6.4 Illustrating Mixture Modeling Using Educational Testing Service Data. 6.5 Illustrating Selection Modeling Using CPS Data. 6.6 Extensions to Surveys with Follow--Ups. 6.7 Follow--Up Response in a Survey of Drinking Behavior Among Men of Retirement Age. References. Author Index. Subject Index. Appendix I. Report Written for the Social Security Administration in 1977. Appendix II. Report Written for the Census Bureau in 1983.

14,574 citations

Journal ArticleDOI
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Abstract: The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.

13,884 citations