Topic

Missing data

About: Missing data is a research topic. Over the lifetime, 21363 publications have been published within this topic receiving 784923 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

Principles of data mining

[...]

David J. Hand, Heikki Mannila, Padhraic Smyth

01 Jan 2001

TL;DR: This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.

...read moreread less

Abstract: The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.

...read moreread less

3,765 citations

Journal Article•DOI•

The Relative Performance of Full Information Maximum Likelihood Estimation for Missing Data in Structural Equation Models

[...]

Craig K. Enders¹, Deborah L. Bandalos²•Institutions (2)

University of Miami¹, University of Nebraska–Lincoln²

01 Jul 2001-Structural Equation Modeling

TL;DR: A Monte Carlo simulation examined the performance of 4 missing data methods in structural equation models and found that full information maximum likelihood (FIML) estimation was superior across all conditions of the design.

...read moreread less

Abstract: A Monte Carlo simulation examined the performance of 4 missing data methods in structural equation models: full information maximum likelihood (FIML), listwise deletion, pairwise deletion, and similar response pattern imputation. The effects of 3 independent variables were examined (factor loading magnitude, sample size, and missing data rate) on 4 outcome measures: convergence failures, parameter estimate bias, parameter estimate efficiency, and model goodness of fit. Results indicated that FIML estimation was superior across all conditions of the design. Under ignorable missing data conditions (missing completely at random and missing at random), FIML estimates were unbiased and more efficient than the other methods. In addition, FIML yielded the lowest proportion of convergence failures and provided near-optimal Type 1 error rates across both simulations.

...read moreread less

3,748 citations

Journal Article•DOI•

Missing value estimation methods for DNA microarrays.

[...]

Olga G. Troyanskaya¹, Michael N. Cantor¹, Gavin Sherlock¹, Patrick O. Brown¹, Trevor Hastie¹, Robert Tibshirani¹, David Botstein¹, Russ B. Altman¹ - Show less +4 more•Institutions (1)

Stanford University¹

01 Jun 2001-Bioinformatics

TL;DR: It is shown that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVD Impute and KNN Impute surpass the commonly used row average method (as well as filling missing values with zeros).

...read moreread less

Abstract: Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. Results: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1–20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions. Availability: The software is available at http://smi-web.

...read moreread less

3,542 citations

Book•

Applied Survival Analysis: Regression Modeling of Time-to-Event Data

[...]

David W. Hosmer¹, Stanley Lemeshow², Susanne May³•Institutions (3)

University of Massachusetts Amherst¹, Ohio State University², University of California, San Diego³

07 Mar 2008

TL;DR: Applied Survival Analysis, Second Edition is an ideal book for graduate-level courses in biostatistics, statistics, and epidemiologic methods and serves as a valuable reference for practitioners and researchers in any health-related field or for professionals in insurance and government.

...read moreread less

Abstract: THE MOST PRACTICAL, UP-TO-DATE GUIDE TO MODELLING AND ANALYZING TIME-TO-EVENT DATANOW IN A VALUABLE NEW EDITION Since publication of the first edition nearly a decade ago, analyses using time-to-event methods have increase considerably in all areas of scientific inquiry mainly as a result of model-building methods available in modern statistical software packages. However, there has been minimal coverage in the available literature to9 guide researchers, practitioners, and students who wish to apply these methods to health-related areas of study. Applied Survival Analysis, Second Edition provides a comprehensive and up-to-date introduction to regression modeling for time-to-event data in medical, epidemiological, biostatistical, and other health-related research. This book places a unique emphasis on the practical and contemporary applications of regression modeling rather than the mathematical theory. It offers a clear and accessible presentation of modern modeling techniques supplemented with real-world examples and case studies. Key topics covered include: variable selection, identification of the scale of continuous covariates, the role of interactions in the model, assessment of fit and model assumptions, regression diagnostics, recurrent event models, frailty models, additive models, competing risk models, and missing data. Features of the Second Edition include: Expanded coverage of interactions and the covariate-adjusted survival functions The use of the Worchester Heart Attack Study as the main modeling data set for illustrating discussed concepts and techniques New discussion of variable selection with multivariable fractional polynomials Further exploration of time-varying covariates, complex with examples Additional treatment of the exponential, Weibull, and log-logistic parametric regression models Increased emphasis on interpreting and using results as well as utilizing multiple imputation methods to analyze data with missing values New examples and exercises at the end of each chapter Analyses throughout the text are performed using Stata Version 9, and an accompanying FTP site contains the data sets used in the book. Applied Survival Analysis, Second Edition is an ideal book for graduate-level courses in biostatistics, statistics, and epidemiologic methods. It also serves as a valuable reference for practitioners and researchers in any health-related field or for professionals in insurance and government.

...read moreread less

3,507 citations

Journal Article•DOI•

Multiple imputation: a primer:

[...]

Joseph L Schafer¹•Institutions (1)

Pennsylvania State University¹

01 Feb 1999-Statistical Methods in Medical Research

TL;DR: Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.

...read moreread less

Abstract: In recent years, multiple imputation has emerged as a convenient and flexible paradigm for analysing data with missing values. Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.

...read moreread less

3,387 citations

Collapse

Network Information

Performance

Metrics

24,297

Papers

908,648

Citations

No. of papers in the topic in previous years
Year	Papers
2025	2
2024	2
2023	931
2022	2,020
2021	1,639
2020	1,642

Missing data

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics