scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Small-sample degrees of freedom with multiple imputation

01 Dec 1999-Biometrika (Oxford University Press)-Vol. 86, Iss: 4, pp 948-955
TL;DR: In this paper, the authors derived an adjusted repeated-imputation degree of freedom, ν m, with the following properties: for fixed m and estimated fraction of missing information, the adjusted degree increases in ν com.
Abstract: An appealing feature of multiple imputation is the simplicity of the rules for combining the multiple complete-data inferences into a final inference, the repeated-imputation inference (Rubin, 1987). This inference is based on a t distribution and is derived from a Bayesian paradigm under the assumption that the complete-data degrees of freedom, ν com , are infinite, but the number of imputations, m, is finite. When ν com is small and there is only a modest proportion of missing data, the calculated repeated-imputation degrees of freedom, ν m , for the t reference distribution can be much larger than ν com , which is clearly inappropriate. Following the Bayesian paradigm, we derive an adjusted degrees of freedom, ν m , with the following three properties: for fixed m and estimated fraction of missing information, ν m monotonically increases in ν com ; ν m is always less than or equal to ν com ; and ν m equals ν m when ν com is infinite. A small simulation study demonstrates the superior frequentist performance when using ν m rather than ν m .
Citations
More filters
Journal ArticleDOI
TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.
Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

10,234 citations


Cites methods from "Small-sample degrees of freedom wit..."

  • ...By default the number of degrees of freedom is calculated using the method of Barnard and Rubin (1999)....

    [...]

Journal ArticleDOI
TL;DR: The principles of the method and how to impute categorical and quantitative variables, including skewed variables, are described and shown and the practical analysis of multiply imputed data is described, including model building and model checking.
Abstract: Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments. Copyright © 2010 John Wiley & Sons, Ltd.

6,349 citations


Cites background from "Small-sample degrees of freedom wit..."

  • ...Wald-type significance tests and confidence intervals for a univariate can be obtained in the usual way from a t-distribution; degrees of freedom are given in references [7, 8]....

    [...]

Journal ArticleDOI
TL;DR: Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.
Abstract: In recent years, multiple imputation has emerged as a convenient and flexible paradigm for analysing data with missing values. Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.

3,387 citations

Book
29 Mar 2012
TL;DR: The problem of missing data concepts of MCAR, MAR and MNAR simple solutions that do not (always) work multiple imputation in a nutshell and some dangers, some do's and some don'ts are covered.
Abstract: Basics Introduction The problem of missing data Concepts of MCAR, MAR and MNAR Simple solutions that do not (always) work Multiple imputation in a nutshell Goal of the book What the book does not cover Structure of the book Exercises Multiple imputation Historic overview Incomplete data concepts Why and when multiple imputation works Statistical intervals and tests Evaluation criteria When to use multiple imputation How many imputations? Exercises Univariate missing data How to generate multiple imputations Imputation under the normal linear normal Imputation under non-normal distributions Predictive mean matching Categorical data Other data types Classification and regression trees Multilevel data Non-ignorable methods Exercises Multivariate missing data Missing data pattern Issues in multivariate imputation Monotone data imputation Joint Modeling Fully Conditional Specification FCS and JM Conclusion Exercises Imputation in practice Overview of modeling choices Ignorable or non-ignorable? Model form and predictors Derived variables Algorithmic options Diagnostics Conclusion Exercises Analysis of imputed data What to do with the imputed data? Parameter pooling Statistical tests for multiple imputation Stepwise model selection Conclusion Exercises Case studies Measurement issues Too many columns Sensitivity analysis Correct prevalence estimates from self-reported data Enhancing comparability Exercises Selection issues Correcting for selective drop-out Correcting for non-response Exercises Longitudinal data Long and wide format SE Fireworks Disaster Study Time raster imputation Conclusion Exercises Extensions Conclusion Some dangers, some do's and some don'ts Reporting Other applications Future developments Exercises Appendices: Software R S-Plus Stata SAS SPSS Other software References Author Index Subject Index

2,156 citations


Cites background from "Small-sample degrees of freedom wit..."

  • ...Technical improvements for the degrees of freedom were suggested by Barnard and Rubin (1999) and Reiter (2007). Iterative algorithms for multivariate missing data with general missing data patterns were proposed by Rubin (1987, p. 192) Schafer (1997), Van Buuren et al. (1999), Raghunathan et al. (2001) and King et al. (2001). Additional work on the choice of the number of imputations was done by Royston et al....

    [...]

  • ...Technical improvements for the degrees of freedom were suggested by Barnard and Rubin (1999) and Reiter (2007)....

    [...]

  • ...Technical improvements for the degrees of freedom were suggested by Barnard and Rubin (1999) and Reiter (2007). Iterative algorithms for multivariate missing data with general missing data patterns were proposed by Rubin (1987, p. 192) Schafer (1997), Van Buuren et al. (1999), Raghunathan et al....

    [...]

  • ...Technical improvements for the degrees of freedom were suggested by Barnard and Rubin (1999) and Reiter (2007). Iterative algorithms for multivariate missing data with general missing data patterns were proposed by Rubin (1987, p. 192) Schafer (1997), Van Buuren et al. (1999), Raghunathan et al. (2001) and King et al. (2001). Additional work on the choice of the number of imputations was done by Royston et al. (2004), Graham et al. (2007) and Bodner (2008). In the 1990s, multiple imputation came under fire from various sides....

    [...]

  • ...Technical improvements for the degrees of freedom were suggested by Barnard and Rubin (1999) and Reiter (2007). Iterative algorithms for multivariate missing data with general missing data patterns were proposed by Rubin (1987, p. 192) Schafer (1997), Van Buuren et al. (1999), Raghunathan et al. (2001) and King et al. (2001). Additional work on the choice of the number of imputations was done by Royston et al. (2004), Graham et al. (2007) and Bodner (2008)....

    [...]

Journal ArticleDOI
TL;DR: This article describes an implementation for Stata of the MICE method of multiple multivariate imputation, described by van Buuren, Boshuizen, and Knook (1999), and describes five ado-files, which create multiple mult variables and utilities to intercon-vert datasets created by mvis and by the miset program from John Carlin and colleagues.
Abstract: Following the seminal publications of Rubin about thirty years ago, statisticians have become increasingly aware of the inadequacy of "complete-case" analysis of datasets with missing observations. In medicine, for example, observa- tions may be missing in a sporadic way for different covariates, and a complete-case analysis may omit as many as half of the available cases. Hotdeck imputation was implemented in Stata in 1999 by Mander and Clayton. However, this technique may perform poorly when many rows of data have at least one missing value. This article describes an implementation for Stata of the MICE method of multiple multivariate imputation described by van Buuren, Boshuizen, and Knook (1999). MICE stands for multivariate imputation by chained equations. The basic idea of data analysis with multiple imputation is to create a small number (e.g., 5-10) of copies of the data, each of which has the missing values suitably imputed, and analyze each complete dataset independently. Estimates of parameters of inter- est are averaged across the copies to give a single estimate. Standard errors are computed according to the "Rubin rules", devised to allow for the between- and within-imputation components of variation in the parameter estimates. This arti- cle describes five ado-files. mvis creates multiple multivariate imputations. uvis imputes missing values for a single variable as a function of several covariates, each with complete data. micombine fits a wide variety of regression models to a mul- tiply imputed dataset, combining the estimates using Rubin's rules, and supports survival analysis models (stcox and streg), categorical data models, generalized linear models, and more. Finally, misplit and mijoin are utilities to intercon- vert datasets created by mvis and by the miset program from John Carlin and colleagues. The use of the routines is illustrated with an example of prognostic modeling in breast cancer.

2,132 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
Abstract: Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.

8,197 citations

Book
01 Aug 1997
TL;DR: The Normal Model Methods for Categorical Data Loglinear Models Methods for Mixed Data and Inference by Data Augmentation Methods for Normal Data provide insights into the construction of categorical and mixed data models.
Abstract: Introduction Assumptions EM and Inference by Data Augmentation Methods for Normal Data More on the Normal Model Methods for Categorical Data Loglinear Models Methods for Mixed Data Further Topics Appendices References Index

6,704 citations

Journal ArticleDOI
TL;DR: A description of the assumed context and objectives of multiple imputation is provided, and a review of the multiple imputations framework and its standard results are reviewed.
Abstract: Multiple imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation and objective, I believe that multiple imputation by the data-base constructor is the method of choice. This article first provides a description of the assumed context and objectives, and second, reviews the multiple imputation framework and its standard results. These preliminary discussions are especially important because some recent commentaries on multiple imputation have reflected either misunderstandings of the practical objectives of multiple imputation or misunderstandings of fundamental theoretical results. Then, criticisms of multiple imputation are considered, and, finally, comparisons are made to alt...

3,495 citations

Journal ArticleDOI
TL;DR: When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating imputation models is to incorporate appropriate importance weights into the standard combining rules.
Abstract: Conducting sample surveys, imputing incomplete observa- tions, and analyzing the resulting data are three indispensable phases of modern practice with public-use data files and with many other statistical applications. Each phase inherits different input, including the information preceding it and the intellectual assessments available, and aims to provide output that is one step closer to arriving at statistical infer- ences with scientific relevance. However, the role of the imputation phase has often been viewed as merely providing computational convenience for users of data. Although facilitating computation is very important, such a viewpoint ignores the imputer's assessments and information inaccessible to the users. This view underlies the recent controversy over the validity of multiple-imputation inference when a procedure for analyzing multi- ply imputed data sets cannot be derived from (is "uncongenial" to) the model adopted for multiple imputation. Given sensible imputations and complete-data analysis procedures, inferences from standard multiple- imputation combining rules are typically superior to, and thus different from, users' incomplete-data analyses. The latter may suffer from serious nonresponse biases because such analyses often must rely on convenient but unrealistic assumptions about the nonresponse mechanism. When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating impu- tations is to incorporate appropriate importance weights into the standard combining rules. These points are reviewed and explored by simple exam- ples and general theory, from both Bayesian and frequentist perspectives, particularly from the randomization perspective. Some convenient terms are suggested for facilitating communication among researchers from dif- ferent perspectives when evaluating multiple-imputation inferences with uncongenial sources of input.

790 citations

Journal ArticleDOI
TL;DR: In this paper, several multiple imputation techniques for simple random samples with ignorable nonresponse on a scalar outcome variable are compared using both analytic and Monte Carlo results concerning coverages of the resulting intervals for the population mean.
Abstract: Several multiple imputation techniques are described for simple random samples with ignorable nonresponse on a scalar outcome variable. The methods are compared using both analytic and Monte Carlo results concerning coverages of the resulting intervals for the population mean. Using m = 2 imputations per missing value gives accurate coverages in common cases and is clearly superior to single imputation (m = 1) in all cases. The performances of the methods for various m can be predicted well by linear interpolation in 1/(m — 1) between the results for m = 2 and m = ∞. As a rough guide, to assure coverages of interval estimates within 2% of the nominal level when using the preferred methods, the number of imputations per missing value should increase from 2 to 3 as the nonresponse rate increases from 10% to 60%.

725 citations