scispace - formally typeset
Search or ask a question

Showing papers by "Donald B. Rubin published in 2019"



Journal ArticleDOI
TL;DR: This work considers a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses, and illustrates an example examining the effect of parental smoking on children’s lung function collected in families living in East Boston in the 1970s.
Abstract: Consider a statistical analysis that draws causal inferences from an observational dataset, inferences that are presented as being valid in the standard frequentist senses; i.e. the analysis produces: (1) consistent point estimates, (2) valid p-values, valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (3) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements, the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that hypothetical randomized data set. This multistage effort with thought-provoking tasks involves: (1) a purely conceptual stage that precisely formulate the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; (2) a design stage that approximates a randomized experiment before any outcome data are observed, (3) a statistical analysis stage comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and (4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects. Stages 2 and 3 may rely on modern computing to implement the effort, whereas Stage 1 demands careful scientific argumentation to make the embedding plausible to scientific readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. The conceptually most demanding tasks are often the most scientifically interesting to the dedicated researcher and readers of the resulting statistical analyses. This perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970s.

36 citations


Journal ArticleDOI
29 Sep 2019
TL;DR: Much of what is written about causal inference is found to be mathematically inapposite in one of these senses because the descriptions either include irrelevant clutter or omit conditions required for the correctness of the assertions.
Abstract: Causal inference refers to the process of inferring what would happen in the future if we change what we are doing, or inferring what would have happened in the past, if we had done something diffe...

34 citations


01 Jan 2019
TL;DR: Blocking is commonly used in randomized experiments to increase efficiency of estimation and to remove allocations with imbalance in covariates between treated and treated experiments.
Abstract: Blocking is commonly used in randomized experiments to increase efficiency of estimation. A generalization of blocking is to remove allocations with imbalance in covariates between treated and cont ...

5 citations