scispace - formally typeset
Search or ask a question
Topic

Imputation (statistics)

About: Imputation (statistics) is a research topic. Over the lifetime, 8203 publications have been published within this topic receiving 315547 citations. The topic is also known as: data imputation.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the effects of missing values are illustrated for a linear model, and a series of recommendations are provided for missing values can produce biased estimates, distorted statistical power, and invalid conclusions.
Abstract: Less than optimum strategies for missing values can produce biased estimates, distorted statistical power, and invalid conclusions. After reviewing traditional approaches (listwise, pairwise, and mean substitution), selected alternatives are covered including single imputation, multiple imputation, and full information maximum likelihood estimation. The effects of missing values are illustrated for a linear model, and a series of recommendations is provided. When missing values cannot be avoided, multiple imputation and full information methods offer substantial improvements over traditional approaches. Selected results using SPSS, NORM, Stata (mvis/micombine), and Mplus are included as is a table of available software and an appendix with examples of programs for Stata and Mplus.

1,687 citations

Book
18 Nov 2004
TL;DR: The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
Abstract: The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and VisualizationOffers extensive coverage of the R statistical programming languageContains 280 end-of-chapter exercisesIncludes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book

1,637 citations

Journal ArticleDOI
TL;DR: The key ideas of multiple imputation are reviewed, the software programs currently available are discussed, and their use on data from the Adolescent Alcohol Prevention Trial is demonstrated.
Abstract: Analyses of multivariate data are frequently hampered by missing values. Until recently, the only missing-data methods available to most data analysts have been relatively ad1 hoc practices such as listwise deletion. Recent dramatic advances in theoretical and computational statistics, however, have produced anew generation of flexible procedures with a sound statistical basis. These procedures involve multiple imputation (Rubin, 1987), a simulation technique that replaces each missing datum with a set of m > 1 plausible values. The rn versions of the complete data are analyzed by standard complete-data methods, and the results are combined using simple rules to yield estimates, standard errors, and p-values that formally incorporate missing-data uncertainty. New computational algorithms and software described in a recent book (Schafer, 1997a) allow us to create proper multiple imputations in complex multivariate settings. This article reviews the key ideas of multiple imputation, discusses the software programs currently available, and demonstrates their use on data from the Adolescent Alcohol Prevention Trial (Hansen & Graham, 199 I).

1,541 citations

Journal ArticleDOI
TL;DR: Quality of research will be enhanced if researchers explicitly acknowledge missing data problems and the conditions under which they occurred, principled methods are employed to handle missing data, and the appropriate treatment of missing data is incorporated into review standards of manuscripts submitted for publication.
Abstract: The impact of missing data on quantitative research can be serious, leading to biased estimates of parameters, loss of information, decreased statistical power, increased standard errors, and weakened generalizability of findings. In this paper, we discussed and demonstrated three principled missing data methods: multiple imputation, full information maximum likelihood, and expectation-maximization algorithm, applied to a real-world data set. Results were contrasted with those obtained from the complete data set and from the listwise deletion method. The relative merits of each method are noted, along with common features they share. The paper concludes with an emphasis on the importance of statistical assumptions, and recommendations for researchers. Quality of research will be enhanced if (a) researchers explicitly acknowledge missing data problems and the conditions under which they occurred, (b) principled methods are employed to handle missing data, and (c) the appropriate treatment of missing data is incorporated into review standards of manuscripts submitted for publication.

1,457 citations

Journal ArticleDOI
TL;DR: In this article, a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm is proposed to deal with missing data in social and behavioral sciences, and the asymptotic efficiencies of different estimators are compared under various assump...
Abstract: Survey and longitudinal studies in the social and behavioral sciences generally contain missing data. Mean and covariance structure models play an important role in analyzing such data. Two promising methods for dealing with missing data are a direct maximum-likelihood and a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm. Typical assumptions under these two methods are ignorable nonresponse and normality of data. However, data sets in social and behavioral sciences are seldom normal, and experience with these procedures indicates that normal theory based methods for nonnormal data very often lead to incorrect model evaluations. By dropping the normal distribution assumption, we develop more accurate procedures for model inference. Based on the theory of generalized estimating equations, a way to obtain consistent standard errors of the two-stage estimates is given. The asymptotic efficiencies of different estimators are compared under various assump...

1,412 citations


Network Information
Related Topics (5)
Regression analysis
31K papers, 1.7M citations
86% related
Inference
36.8K papers, 1.3M citations
85% related
Linear regression
21.3K papers, 1.2M citations
82% related
Estimator
97.3K papers, 2.6M citations
82% related
Single-nucleotide polymorphism
40.1K papers, 1.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20244
2023740
20221,477
2021852
2020724
2019603