scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Inference and missing data

01 Dec 1976-Biometrika (Oxford University Press)-Vol. 63, Iss: 3, pp 581-592
TL;DR: In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
Abstract: Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations

Journal ArticleDOI
TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.
Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

10,568 citations

Journal ArticleDOI
TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.
Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

10,234 citations

Journal ArticleDOI
TL;DR: A class of generalized estimating equations (GEEs) for the regression parameters is proposed, extensions of those used in quasi-likelihood methods which have solutions which are consistent and asymptotically Gaussian even when the time dependence is misspecified as the authors often expect.
Abstract: Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One objective of statistical analysis is to describe the marginal expectation of the outcome variable as a function of the covariates while accounting for the correlation among the repeated observations for a given subject. This paper proposes a unifying approach to such analysis for a variety of discrete and continuous outcomes. A class of generalized estimating equations (GEEs) for the regression parameters is proposed. The equations are extensions of those used in quasi-likelihood (Wedderburn, 1974, Biometrika 61, 439-447) methods. The GEEs have solutions which are consistent and asymptotically Gaussian even when the time dependence is misspecified as we often expect. A consistent variance estimate is presented. We illustrate the use of the GEE approach with longitudinal data from a study of the effect of mothers' stress on children's morbidity.

7,080 citations

References
More filters
Book
01 Jun 1973
TL;DR: In this article, the effect of non-normality on inference about a population mean with generalizations was investigated. But the authors focused on the effect on the mean with information from more than one source.
Abstract: Nature of Bayesian Inference Standard Normal Theory Inference Problems Bayesian Assessment of Assumptions: Effect of Non-Normality on Inferences About a Population Mean with Generalizations Bayesian Assessment of Assumptions: Comparison of Variances Random Effect Models Analysis of Cross Classification Designs Inference About Means with Information from More than One Source: One-Way Classification and Block Designs Some Aspects of Multivariate Analysis Estimation of Common Regression Coefficients Transformation of Data Tables References Indexes.

3,896 citations

Journal ArticleDOI
TL;DR: In this paper, the authors give an approach to derive maximum likelihood estimates of parameters of multivariate normal distributions in cases where some observations are missing (Edgett [2] and Lord [3], [4]).
Abstract: S EVERAL authors recently have derived maximum likelihood estimates of parameters of multivariate normal distributions in cases where some observations are missing (Edgett [2] and Lord [3], [4]). The purpose of this note is to give an approach to these problems that indicates the estimates with a minimum of mathematical manipulation; this approach can easily be applied to other cases. (The technique bears some resemblance to that of Cochran and Bliss in a dierent problem [1].) The method will be indicated by treating the simplest case involving a bivariate normal distribution. Suppose x and y have a bivariate normal distribution with means P, and m,u variances ,2 and UY2 and correlation coefficient p. We shall indicate the density by n(x, y|,ux, p,u; 2 a2; p). Suppose n observations are made on the pair (x, y) and N-n observations are made on x; that is, N-n observations on y are missing. The data are

563 citations

Journal ArticleDOI
TL;DR: In this paper, a review of the literature on the problem of handling multivariate data with observations missing on some or all of the variables under study is presented, where the authors examine the ways that statisticians have devised to estimate means, variances, correlations and linear regression functions.
Abstract: In this paper we review the literature on the problem of handling multivariate data with observations missing on some or all of the variables under study. We examine the ways that statisticians have devised to estimate means, variances, correlations and linear regression functions from such data and refer to specific computer programs for carrying out the estimation. We show how the estimation problems can be simplified if the missing data follows certain patterns. Finally, we outline the statistical properties of the various estimators.

303 citations