scispace - formally typeset
Search or ask a question

Showing papers by "Donald B. Rubin published in 2017"


Posted Content
TL;DR: It is proved in the key result that given the same number of rerandomizations, in expected value, under certain mild assumptions, sequential re randomization achieves better covariate balance than rerandomization at one time.
Abstract: The seminal work of Morgan and Rubin (2012) considers rerandomization for all the units at one time. In practice, however, experimenters may have to rerandomize units sequentially. For example, a clinician studying a rare disease may be unable to wait to perform an experiment until all the experimental units are recruited. Our work offers a mathematical framework for sequential rerandomization designs, where the experimental units are enrolled in groups. We formulate an adaptive rerandomization procedure for balancing treatment/control assignments over some continuous or binary covariates, using Mahalanobis distance as the imbalance measure. We prove in our key result, Theorem 3, that given the same number of rerandomizations (in expected value), under certain mild assumptions, sequential rerandomization achieves better covariate balance than rerandomization at one time.

31 citations


Journal ArticleDOI
TL;DR: It is shown, using an extensive simulation, that some highly advocated methods have poor operating characteristics and in many conditions, matching for the point estimate combined with within-group matching for sampling variance estimation appears to be the most efficient valid method.
Abstract: The estimation of causal effects in nonrandomized studies should comprise two distinct phases: design, with no outcome data available; and analysis of the outcome data according to a specified protocol. Here, we review and compare point and interval estimates of common statistical procedures for estimating causal effects (i.e. matching, subclassification, weighting, and model-based adjustment) with a scalar continuous covariate and a scalar continuous outcome. We show, using an extensive simulation, that some highly advocated methods have poor operating characteristics. In many conditions, matching for the point estimate combined with within-group matching for sampling variance estimation, with or without covariance adjustment, appears to be the most efficient valid method of those evaluated. These results provide new conclusions and advice regarding the merits of currently used procedures.

27 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose definitions of partially MAR and ignorability for a subvector of the parameters of particular substantive interest, for direct likelihood-based inferences from data with missing values.
Abstract: For likelihood-based inferences from data with missing values, models are generally needed for both the data and the missing-data mechanism. However, modeling the mechanism can be challenging, and parameters are often poorly identified. Rubin in 1976 showed that for likelihood and Bayesian inference, sufficient conditions for ignoring the missing data mechanism are (a) the missing data are missing at random (MAR), in the sense that missingness does not depend on the missing values after conditioning on the observed data and (b) the parameters of the data model and the missingness mechanism are distinct, that is, there are no a priori ties, via parameter space restrictions or prior distributions, between these two sets of parameters. These conditions are sufficient but not always necessary, and they relate to the full vector of parameters of the data model. We propose definitions of partially MAR and ignorability for a subvector of the parameters of particular substantive interest, for direct likel...

24 citations


Journal ArticleDOI
TL;DR: This workshop addressed challenges of clinical research in neurosurgery by considering possible solutions, such as statistical methods for demonstrating causality using observational data, characteristics required of a registry supporting effectiveness research, and trial designs combining advantages of observational studies and RCTs.
Abstract: This workshop addressed challenges of clinical research in neurosurgery. Randomized controlled clinical trials (RCTs) have high internal validity, but often insufficiently generalize to real-world practice. Observational studies are inclusive but often lack sufficient rigor. The workshop considered possible solutions, such as (1) statistical methods for demonstrating causality using observational data; (2) characteristics required of a registry supporting effectiveness research; (3) trial designs combining advantages of observational studies and RCTs; and (4) equipoise, an identified challenge for RCTs. In the future, advances in information technology potentially could lead to creation of a massive database where clinical data from all neurosurgeons are integrated and analyzed, ending the separation of clinical research and practice and leading to a new "science of practice."

24 citations


Journal ArticleDOI
TL;DR: An experimental design, randomization to randomization probabilities (R2R), is proposed, which significantly improves estimates of treatment effects under actual conditions of use by manipulating participant expectations about receiving treatment.
Abstract: Blinded randomized controlled trials (RCT) require participants to be uncertain if they are receiving a treatment or placebo. Although uncertainty is ideal for isolating the treatment effect from all other potential effects, it is poorly suited for estimating the treatment effect under actual conditions of intended use-when individuals are certain that they are receiving a treatment. We propose an experimental design, randomization to randomization probabilities (R2R), which significantly improves estimates of treatment effects under actual conditions of use by manipulating participant expectations about receiving treatment. In the R2R design, participants are first randomized to a value, π, denoting their probability of receiving treatment (vs. placebo). Subjects are then told their value of π and randomized to either treatment or placebo with probabilities π and 1-π, respectively. Analysis of the treatment effect includes statistical controls for π (necessary for causal inference) and typically a π-by-treatment interaction. Random assignment of subjects to π and disclosure of its value to subjects manipulates subject expectations about receiving the treatment without deception. This method offers a better treatment effect estimate under actual conditions of use than does a conventional RCT. Design properties, guidelines for power analyses, and limitations of the approach are discussed. We illustrate the design by implementing an RCT of caffeine effects on mood and vigilance and show that some of the actual effects of caffeine differ by the expectation that one is receiving the active drug. (PsycINFO Database Record

10 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss a simple but possibly canonical, example of uncongeniality when using multiple imputations to create synthetic data, which specifically addresses the choices made by the imputer.
Abstract: Several statistical agencies have started to use multiply-imputed synthetic microdata to create public-use data in major surveys. The purpose of doing this is to protect the confidentiality of respondents’ identities and sensitive attributes, while allowing standard complete-data analyses of microdata. A key challenge, faced by advocates of synthetic data, is demonstrating that valid statistical inferences can be obtained from such synthetic data for non-confidential questions. Large discrepancies between observed-data and synthetic-data analytic results for such questions may arise because of uncongeniality; that is, differences in the types of inputs available to the imputer, who has access to the actual data, and to the analyst, who has access only to the synthetic data. Here, we discuss a simple, but possibly canonical, example of uncongeniality when using multiple imputation to create synthetic data, which specifically addresses the choices made by the imputer. An initial, unanticipated but not surprising, conclusion is that non-confidential design information used to impute synthetic data should be released with the confidential synthetic data to allow users of synthetic data to avoid possible grossly conservative inferences.

8 citations



Book ChapterDOI
18 Jul 2017
TL;DR: This proposal uses principal stratification as the key statistical tool and is applied to initial data from an actual experiment to illustrate important ideas.
Abstract: Although randomized controlled trials (RCTs) are generally considered the gold standard for estimating causal effects, for example of pharmaceutical treatments, the valid analysis of RCTs is more complicated with human units than with plants and other such objects. One potential complication that arises with human subjects is the possible existence of placebo effects in RCTs with placebo controls, where a treatment, suppose a new drug, is compared to a placebo, and for approval, the treatment must demonstrate better outcomes than the placebo. In such trials, the causal estimand of interest is the medical effect of the drug compared to placebo. But in practice, when a drug is prescribed by a doctor and the patient is aware of the prescription received, the patient can be expected to receive both a placebo effect and the active effect of the drug. An important issue for practice concerns how to disentangle the medical effect of the drug from the placebo effect of being treated using data arising in a placebo-controlled RCT. Our proposal uses principal stratification as the key statistical tool. The method is applied to initial data from an actual experiment to illustrate important ideas.

2 citations


Posted Content
TL;DR: In this paper, the authors propose a different approach to examine causal effects of environmental exposures on health outcomes from observational data, based on insights from classical experimental design, involves four stages, and relies on modern computing to implement the effort in two of the four stages.
Abstract: The health effects of environmental exposures have been studied for decades, typically using standard regression models to assess exposure-outcome associations found in observational non-experimental data. We propose and illustrate a different approach to examine causal effects of environmental exposures on health outcomes from observational data. Our strategy attempts to structure the observational data to approximate data from a hypothetical, but realistic, randomized experiment. This approach, based on insights from classical experimental design, involves four stages, and relies on modern computing to implement the effort in two of the four stages.More specifically, our strategy involves: 1) a conceptual stage that involves the precise formulation of the causal question in terms of a hypothetical randomized experiment where the exposure is assigned to units; 2) a design stage that attempts to reconstruct (or approximate) a randomized experiment before any outcome data are observed, 3) a statistical analysis comparing the outcomes of interest in the exposed and non-exposed units of the hypothetical randomized experiment, and 4) a summary stage providing conclusions about statistical evidence for the sizes of possible causal effects of the exposure on outcomes. We illustrate our approach using an example examining the effect of parental smoking on children's lung function collected in families living in East Boston in the 1970's. To complement the traditional purely model-based approaches, our strategy, which includes outcome free matched-sampling, provides workable tools to quantify possible detrimental exposure effects on human health outcomes especially because it also includes transparent diagnostics to assess the assumptions of the four-stage statistical approach being applied.