The use of multiple imputation for the analysis of missing data.

doi:10.1037/1082-989X.6.4.317

Home
/
Papers
/
The use of multiple imputation for the analysis of missing data.

Journal Article•DOI•

The use of multiple imputation for the analysis of missing data.

Sandip Sinharay¹, Hal S. Stern, Daniel Russell•Institutions (1)

Iowa State University¹

01 Dec 2001-Psychological Methods (American Psychological Association)-Vol. 6, Iss: 4, pp 317-329

TL;DR: The idea behind MI, the advantages of MI over existing techniques for addressing missing data, how to do MI for real problems, the software available to implement MI, and the results of a simulation study aimed at finding out how assumptions regarding the imputation model affect the parameter estimates provided by MI are discussed.

read less

Abstract: This article provides a comprehensive review of multiple imputation (MI), a technique for analyzing data sets with missing values. Formally, MI is the process of replacing each missing data point with a set of m > 1 plausible values to generate m complete data sets. These complete data sets are then analyzed by standard statistical software, and the results combined, to give parameter estimates and standard errors that take into account the uncertainty due to the missing data values. This article introduces the idea behind MI, discusses the advantages of MI over existing techniques for addressing missing data, describes how to do MI for real problems, reviews the software available to implement MI, and discusses the results of a simulation study aimed at finding out how assumptions regarding the imputation model affect the parameter estimates provided by MI.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Missing data: Our view of the state of the art.

[...]

Joseph L. Schafer, John W. Graham

01 Jun 2002-Psychological Methods

TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.

...read moreread less

Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

...read moreread less

10,568 citations

Cites methods from "The use of multiple imputation for ..."

...A gentle introduction and tutorial on the use of MI under a multivariate normal model was provided by Schafer and Olsen (1998; also see Graham et al., in press; Sinharay et al., 2001)....
[...]

Journal Article•DOI•

Statistical Analysis with Missing Data

[...]

Martin G. Gibson

01 Mar 1989-The Statistician

3,152 citations

The handbook of research synthesis and meta-analysis, 2nd ed.

[...]

Harris Cooper¹, Larry V. Hedges, Jeffrey C. Valentine²•Institutions (2)

Duke University¹, University of Louisville²

01 Dec 2009

2,243 citations

Journal Article•DOI•

Multiple imputation of discrete and continuous data by fully conditional specification

[...]

Stef van Buuren¹•Institutions (1)

Utrecht University¹

01 Jun 2007-Statistical Methods in Medical Research

TL;DR: FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable, but its statistical properties are difficult to establish.

...read moreread less

Abstract: The goal of multiple imputation is to provide valid inferences for statistical estimates from incomplete data. To achieve that goal, imputed values should preserve the structure in the data, as well as the uncertainty about this structure, and include any knowledge about the process that generated the missing data. Two approaches for imputing multivariate data exist: joint modeling (JM) and fully conditional specification (FCS). JM is based on parametric statistical theory, and leads to imputation procedures whose statistical properties are known. JM is theoretically sound, but the joint model may lack flexibility needed to represent typical data features, potentially leading to bias. FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable. FCS provides tremendous flexibility and is easy to apply, but its statistical properties are difficult to establish. Simulation work shows that FCS behaves very well in ...

...read moreread less

2,119 citations

Journal Article•DOI•

Principled missing data methods for researchers.

[...]

Yiran Dong¹, Chao-Ying Joanne Peng¹•Institutions (1)

Indiana University¹

14 May 2013-SpringerPlus

TL;DR: Quality of research will be enhanced if researchers explicitly acknowledge missing data problems and the conditions under which they occurred, principled methods are employed to handle missing data, and the appropriate treatment of missing data is incorporated into review standards of manuscripts submitted for publication.

...read moreread less

Abstract: The impact of missing data on quantitative research can be serious, leading to biased estimates of parameters, loss of information, decreased statistical power, increased standard errors, and weakened generalizability of findings. In this paper, we discussed and demonstrated three principled missing data methods: multiple imputation, full information maximum likelihood, and expectation-maximization algorithm, applied to a real-world data set. Results were contrasted with those obtained from the complete data set and from the listwise deletion method. The relative merits of each method are noted, along with common features they share. The paper concludes with an emphasis on the importance of statistical assumptions, and recommendations for researchers. Quality of research will be enhanced if (a) researchers explicitly acknowledge missing data problems and the conditions under which they occurred, (b) principled methods are employed to handle missing data, and (c) the appropriate treatment of missing data is incorporated into review standards of manuscripts submitted for publication.

...read moreread less

1,457 citations

Cites background from "The use of multiple imputation for ..."

...Another advantage of MI over ML-based methods is its computational simplicity (Sinharay et al. 2001)....
[...]
...Unfortunately, the importance of assessing the convergence was rarely mentioned in articles that reviewed the theory and application of MCMC (Schafer 1999; Schafer and Graham 2002; Schlomer et al. 2010; Sinharay et al. 2001)....
[...]
...The distribution of Y depending on the unknown parameter θ can be therefore written as: P Y θÞ ¼ P Y obs;Ymis θÞ ¼ P Y obs θÞP Ymis Y obs; θÞ:jðjðjðjð ð13Þ Equation 13 can be written as a likelihood function as Equation 14: L θ Y Þ ¼ L θ Yobs;YmisÞ ¼ cL θ YobsÞP Ymis Y obs; θÞ;jðjðjðjð ð14Þ where c…...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Maximum likelihood from incomplete data via the EM algorithm

[...]

Arthur P. Dempster¹, Nan M. Laird¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Sep 1977-Journal of the royal statistical society series b-methodological

49,597 citations

Book•

Statistical Analysis with Missing Data

[...]

Roderick J. A. Little, Donald B. Rubin

01 Jan 1987

TL;DR: This work states that maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse and large-Sample Inference Based on Maximum Likelihood Estimates is likely to be high.

...read moreread less

Abstract: Preface.PART I: OVERVIEW AND BASIC APPROACHES.Introduction.Missing Data in Experiments.Complete-Case and Available-Case Analysis, Including Weighting Methods.Single Imputation Methods.Estimation of Imputation Uncertainty.PART II: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA.Theory of Inference Based on the Likelihood Function.Methods Based on Factoring the Likelihood, Ignoring the Missing-Data Mechanism.Maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse.Large-Sample Inference Based on Maximum Likelihood Estimates.Bayes and Multiple Imputation.PART III: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA: APPLICATIONS TO SOME COMMON MODELS.Multivariate Normal Examples, Ignoring the Missing-Data Mechanism.Models for Robust Estimation.Models for Partially Classified Contingency Tables, Ignoring the Missing-Data Mechanism.Mixed Normal and Nonnormal Data with Missing Values, Ignoring the Missing-Data Mechanism.Nonignorable Missing-Data Models.References.Author Index.Subject Index.

...read moreread less

18,201 citations

Book•

Bayesian Data Analysis

[...]

Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin¹ - Show less +2 more•Institutions (1)

University of California, Irvine¹

01 Jan 1995

TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.

...read moreread less

Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

...read moreread less

16,079 citations

Book•

Multiple imputation for nonresponse in surveys

[...]

Donald B. Rubin

01 Jan 1987

TL;DR: In this article, a survey of drinking behavior among men of retirement age was conducted and the results showed that the majority of the participants reported that they did not receive any benefits from the Social Security Administration.

...read moreread less

Abstract: Tables and Figures. Glossary. 1. Introduction. 1.1 Overview. 1.2 Examples of Surveys with Nonresponse. 1.3 Properly Handling Nonresponse. 1.4 Single Imputation. 1.5 Multiple Imputation. 1.6 Numerical Example Using Multiple Imputation. 1.7 Guidance for the Reader. 2. Statistical Background. 2.1 Introduction. 2.2 Variables in the Finite Population. 2.3 Probability Distributions and Related Calculations. 2.4 Probability Specifications for Indicator Variables. 2.5 Probability Specifications for (X,Y). 2.6 Bayesian Inference for a Population Quality. 2.7 Interval Estimation. 2.8 Bayesian Procedures for Constructing Interval Estimates, Including Significance Levels and Point Estimates. 2.9 Evaluating the Performance of Procedures. 2.10 Similarity of Bayesian and Randomization--Based Inferences in Many Practical Cases. 3. Underlying Bayesian Theory. 3.1 Introduction and Summary of Repeated--Imputation Inferences. 3.2 Key Results for Analysis When the Multiple Imputations are Repeated Draws from the Posterior Distribution of the Missing Values. 3.3 Inference for Scalar Estimands from a Modest Number of Repeated Completed--Data Means and Variances. 3.4 Significance Levels for Multicomponent Estimands from a Modest Number of Repeated Completed--Data Means and Variance--Covariance Matrices. 3.5 Significance Levels from Repeated Completed--Data Significance Levels. 3.6 Relating the Completed--Data and Completed--Data Posterior Distributions When the Sampling Mechanism is Ignorable. 4. Randomization--Based Evaluations. 4.1 Introduction. 4.2 General Conditions for the Randomization--Validity of Infinite--m Repeated--Imputation Inferences. 4.3Examples of Proper and Improper Imputation Methods in a Simple Case with Ignorable Nonresponse. 4.4 Further Discussion of Proper Imputation Methods. 4.5 The Asymptotic Distibution of (Qm,Um,Bm) for Proper Imputation Methods. 4.6 Evaluations of Finite--m Inferences with Scalar Estimands. 4.7 Evaluation of Significance Levels from the Moment--Based Statistics Dm and Dm with Multicomponent Estimands. 4.8 Evaluation of Significance Levels Based on Repeated Significance Levels. 5. Procedures with Ignorable Nonresponse. 5.1 Introduction. 5.2 Creating Imputed Values under an Explicit Model. 5.3 Some Explicit Imputation Models with Univariate YI and Covariates. 5.4 Monotone Patterns of Missingness in Multivariate YI. 5.5 Missing Social Security Benefits in the Current Population Survey. 5.6 Beyond Monotone Missingness. 6. Procedures with Nonignorable Nonresponse. 6.1 Introduction. 6.2 Nonignorable Nonresponse with Univariate YI and No XI. 6.3 Formal Tasks with Nonignorable Nonresponse. 6.4 Illustrating Mixture Modeling Using Educational Testing Service Data. 6.5 Illustrating Selection Modeling Using CPS Data. 6.6 Extensions to Surveys with Follow--Ups. 6.7 Follow--Up Response in a Survey of Drinking Behavior Among Men of Retirement Age. References. Author Index. Subject Index. Appendix I. Report Written for the Social Security Administration in 1977. Appendix II. Report Written for the Census Bureau in 1983.

...read moreread less

14,574 citations

Book•

Analysis of Incomplete Multivariate Data

[...]

J.L. Schafer¹•Institutions (1)

Pennsylvania State University¹

01 Aug 1997

TL;DR: The Normal Model Methods for Categorical Data Loglinear Models Methods for Mixed Data and Inference by Data Augmentation Methods for Normal Data provide insights into the construction of categorical and mixed data models.

...read moreread less

Abstract: Introduction Assumptions EM and Inference by Data Augmentation Methods for Normal Data More on the Normal Model Methods for Categorical Data Loglinear Models Methods for Mixed Data Further Topics Appendices References Index

...read moreread less

6,704 citations

"The use of multiple imputation for ..." refers background in this paper

...Recently, however, with the advent of faster computers, MI has become quite popular in survey and nonsurvey contexts (Rubin, 1996; Schafer, 1997a; Schafer & Olsen, 1998)....
[...]