scispace - formally typeset
Journal ArticleDOI

Inference and missing data

Donald B. Rubin
- 01 Dec 1976 - 
- Vol. 63, Iss: 3, pp 581-592
TLDR
In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
Abstract
Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.

read more

Citations
More filters
Book ChapterDOI

Multiple Imputation of Multilevel Data

TL;DR: In the early days of multilevel analysis, Goldstein wrote: “The authors shall require and assume that all the necessary data at each level are available” (Goldstein, 1987), and this requirement is still dominant today.
Journal ArticleDOI

Helpers in a cooperatively breeding cichlid stay and pay or disperse and breed, depending on ecological constraints

Abstract: The theory of family-group dynamics predicts that group structure, helping behaviour and social interactions among group members should vary with the opportunities of subordinates to breed independently. We investigated experimentally whether unrelated mature helpers in the cooperatively breeding cichlid Neolamprologus pulcher reduce costly social and cooperative behaviour and choose to disperse and breed independently when offered vacant breeding sites. As predicted by the ecological constraints hypothesis, when breeding substrate was available, (i) helpers spent more time in dispersal areas and it was mainly large helpers that left the group to breed independently; (ii) all helpers invested less in costly submissive behaviours towards other group members and large helpers reduced help, supporting the ‘pay-to-stay’ hypothesis; and (iii) large helpers, particularly those that dispersed and bred, increased more in body mass in the treatment than those without breeding options, suggesting status-dependent strategic growth of helpers. We conclude that helpers of N. pulcher decide whether to stay and pay or disperse and breed in response to constraints on independent breeding.
Journal ArticleDOI

Piecing together the past: statistical insights into paleoclimatic reconstructions

TL;DR: This article considers the challenge of inferring a climate process through space and time from overlapping instrumental and climate sensitive proxy time series that are assumed to be well dated – an assumption that is likely only reasonable for certain proxies over at most the last few millennia.
Journal ArticleDOI

Statistical properties of randomization in clinical trials.

TL;DR: This paper presents definitions and discussions of the statistical properties of randomization procedures as they relate to both the design of a clinical trial and the statistical analysis of trial results, and the expected selection bias associated with a randomization procedure.
Journal ArticleDOI

Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election

TL;DR: This article suggests a framework to address a question: “Which one should I trust more: a 1% survey with 60% response rate or a self-reported administrative dataset covering 80% of the population?”
References
More filters
Book

Bayesian inference in statistical analysis

TL;DR: In this article, the effect of non-normality on inference about a population mean with generalizations was investigated. But the authors focused on the effect on the mean with information from more than one source.
Journal ArticleDOI

Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing

TL;DR: In this paper, the authors give an approach to derive maximum likelihood estimates of parameters of multivariate normal distributions in cases where some observations are missing (Edgett [2] and Lord [3], [4]).
Journal ArticleDOI

Missing Observations in Multivariate Statistics I. Review of the Literature

TL;DR: In this paper, a review of the literature on the problem of handling multivariate data with observations missing on some or all of the variables under study is presented, where the authors examine the ways that statisticians have devised to estimate means, variances, correlations and linear regression functions.