Showing papers by "Donald B. Rubin published in 2015"
•
06 Apr 2015TL;DR: In this paper, two world-renowned experts present statistical methods for studying causal in nature: what would happen to individuals, or to groups, if part of their environment were changed?
Abstract: Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding to the outcome that would be realized if a subject were exposed to a particular treatment or regime. In this approach, causal effects are comparisons of such potential outcomes. The fundamental problem of causal inference is that we can only observe one of the potential outcomes for a particular subject. The authors discuss how randomized experiments allow us to assess causal effects and then turn to observational studies. They lay out the assumptions needed for causal inference and describe the leading analysis methods, including, matching, propensity-score methods, and instrumental variables. Many detailed applications are included, with special focus on practical aspects for the empirical researcher.
1,129 citations
•
01 Jan 2015
TL;DR: In this paper, two world-renowned experts present statistical methods for studying causal in nature: what would happen to individuals, or to groups, if part of their environment were changed?
Abstract: Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding to the outcome that would be realized if a subject were exposed to a particular treatment or regime. In this approach, causal effects are comparisons of such potential outcomes. The fundamental problem of causal inference is that we can only observe one of the potential outcomes for a particular subject. The authors discuss how randomized experiments allow us to assess causal effects and then turn to observational studies. They lay out the assumptions needed for causal inference and describe the leading analysis methods, including matching, propensity-score methods, and instrumental variables. Many detailed applications are included, with special focus on practical aspects for the empirical researcher.
728 citations
••
01 Apr 2015378 citations
••
01 Apr 2015
TL;DR: In this paper, the authors relax the unconfoundedness assumption without replacing it with additional assumptions, and so do not focus on obtaining point estimates of the causal estimands of interest.
Abstract: INTRODUCTION Part IV of this text focused on estimation and inference under regular assignment mechanisms, that is, ones that are individualistic with probabilistic assignment, as well as unconfounded. In Part V we study methods that confront the unconfoundedness assumption. In Chapter 21 we discussed methods to assess the plausibility of this assumption by combining it with additional assumptions. In the current chapter we relax the unconfoundedness assumption without replacing it with additional assumptions, and so do not focus on obtaining point estimates of the causal estimands of interest. Instead we end up with ranges of plausible values for these estimands, with the width of these ranges corresponding to the extent to which we allow the unconfoundedness assumption to be violated. We consider two approaches that have much in common. The first, developed by Manski in a series of studies (e.g., Manski, 1990, 1996, 2003, 2013), allows for arbitrarily large violations of the unconfoundedness assumption. This bounds or partial identification approach, as it is called, leads to sharp results, but at the same time will be seen to limit severely the types of inferences about causal effects that can be drawn from observational data. The second approach, following work in this area by Rosenbaum and Rubin (1983) and Rosenbaum (1995), with important antecedents in the work by Cornfield et al. (1959), works from the other extreme in the sense that unconfoundedness is the starting point, and only limited violations from it need to be considered. If we allow for large violations in the Rosenbaum-Rubin approach, it will often lead to essentially the same results as the Manski bounds approach, but with limited violations of the unconfoundedness assumption, the sensitivity approach results in narrower ranges for the estimands than the partial identification approach. The key to any sensitivity analysis will be how to assess the magnitude of violations from unconfoundedness. The setup in the current chapter assumes that unconfoundedness is satisfied conditional on an additional, unobserved covariate. If, conditional on the other, observed, covariates, this unobserved covariate is independent of the potential outcomes, or if, again conditional on the observed covariates, it is independent of treatment assignment, unconfoundedness holds even without conditioning on this additional covariate.
313 citations
••
TL;DR: This work illustrates how rerandomization could have improved the design of an already conducted randomized experiment on vocabulary and mathematics training programs, then provides a re randomization procedure for covariates that vary in importance, and offers other extensions for re Randomization, including methods addressing computational efficiency.
Abstract: When conducting a randomized experiment, if an allocation yields treatment groups that differ meaningfully with respect to relevant covariates, groups should be rerandomized. The process involves specifying an explicit criterion for whether an allocation is acceptable, based on a measure of covariate balance, and rerandomizing units until an acceptable allocation is obtained. Here, we illustrate how rerandomization could have improved the design of an already conducted randomized experiment on vocabulary and mathematics training programs, then provide a rerandomization procedure for covariates that vary in importance, and finally offer other extensions for rerandomization, including methods addressing computational efficiency. When covariates vary in a priori importance, better balance should be required for more important covariates. Rerandomization based on Mahalanobis distance preserves the joint distribution of covariates, but balances all covariates equally. Here, we propose rerandomizing based on Ma...
80 citations
••
TL;DR: In this paper, a framework for causal inference from two-level factorial designs is proposed, which uses potential outcomes to define causal effects, and explores the effect of nonadditivity of unit level treatment effects on Neyman's repeated sampling approach for estimation of causal effects and on Fisher's randomization tests on sharp null hypotheses in these designs.
Abstract: Summary
A framework for causal inference from two-level factorial designs is proposed, which uses potential outcomes to define causal effects. The paper explores the effect of non-additivity of unit level treatment effects on Neyman's repeated sampling approach for estimation of causal effects and on Fisher's randomization tests on sharp null hypotheses in these designs. The framework allows for statistical inference from a finite population, permits definition and estimation of estimands other than ‘average factorial effects’ and leads to more flexible inference procedures than those based on ordinary least squares estimation from a linear model.
80 citations
••
TL;DR: In this article, the concept of missingness at random in incomplete data analysis is clarified and the implication of the more restrictive missing-always-at-random assumption when coupled with full unit-exchangeability for the matrix of the variables of interest and the missingness indicators is discussed.
Abstract: We clarify the key concept of missingness at random in incomplete data analysis. We first distinguish between data being missing at random and the missingness mechanism being a missing-at-random one, which we call missing always at random and which is more restrictive. We further discuss how, in general, neither of these conditions is a statement about conditional independence. We then consider the implication of the more restrictive missing-always-at-random assumption when coupled with full unit-exchangeability for the matrix of the variables of interest and the missingness indicators: the conditional distribution of the missingness indicators for any variable that can have a missing value can depend only on variables that are always fully observed. We discuss implications of this for modelling missingness mechanisms.
77 citations
••
TL;DR: In this article, the causal effects of military interventions on the homicide rates in certain problematic regions in Mexico were analyzed using the Rubin causal model to compare the post-intervention homicide rate in each intervened region to the hypothetical homicide rate for that same year had the military intervention not taken place.
Abstract: We analyze publicly available data to estimate the causal effects of military interventions on the homicide rates in certain problematic regions in Mexico. We use the Rubin causal model to compare the post-intervention homicide rate in each intervened region to the hypothetical homicide rate for that same year had the military intervention not taken place. Because the effect of a military intervention is not confined to the municipality subject to the intervention, a nonstandard definition of units is necessary to estimate the causal effect of the intervention under the standard no-interference assumption of stable-unit treatment value assumption (SUTVA). Donor pools are created for each missing potential outcome under no intervention, thereby allowing for the estimation of unit-level causal effects. A multiple imputation approach accounts for uncertainty about the missing potential outcomes.
55 citations
••
TL;DR: This work shows that the new 'multiple-imputation using two subclassification splines' method appears to be the most efficient and has coverage levels that are closest to nominal, and can estimate finite population average causal effects as well as non-linear causal estimands.
Abstract: Estimation of causal effects in non-randomized studies comprises two distinct phases: design, without outcome data, and analysis of the outcome data according to a specified protocol. Recently, Gutman and Rubin (2013) proposed a new analysis-phase method for estimating treatment effects when the outcome is binary and there is only one covariate, which viewed causal effect estimation explicitly as a missing data problem. Here, we extend this method to situations with continuous outcomes and multiple covariates and compare it with other commonly used methods (such as matching, subclassification, weighting, and covariance adjustment). We show, using an extensive simulation, that of all methods considered, and in many of the experimental conditions examined, our new 'multiple-imputation using two subclassification splines' method appears to be the most efficient and has coverage levels that are closest to nominal. In addition, it can estimate finite population average causal effects as well as non-linear causal estimands. This type of analysis also allows the identification of subgroups of units for which the effect appears to be especially beneficial or harmful.
54 citations
••
TL;DR: Analysis of health data archives must be conducted in such a way that individuals' privacy is not compromised, and protecting the confidentiality of their data is protected.
Abstract: Health and medical data are increasingly being generated, collected, and stored in electronic form in healthcare facilities and administrative agencies. Such data hold a wealth of information vital to effective health policy development and evaluation, as well as to enhanced clinical care through evidence-based practice and safety and quality monitoring. These initiatives are aimed at improving individuals' health and well-being. Nevertheless, analyses of health data archives must be conducted in such a way that individuals' privacy is not compromised. One important aspect of protecting individuals' privacy is protecting the confidentiality of their data. It is the purpose of this paper to provide a review of a number of approaches to reducing disclosure risk when making data available for research, and to present a taxonomy for such approaches. Some of these methods are widely used, whereas others are still in development. It is important to have a range of methods available because there is also a range of data-use scenarios, and it is important to be able to choose between methods suited to differing scenarios. In practice, it is necessary to find a balance between allowing the use of health and medical data for research and protecting confidentiality. This balance is often presented as a trade-off between disclosure risk and data utility, because methods that reduce disclosure risk, in general, also reduce data utility.
35 citations
••
TL;DR: The anomalous magnetic moment of the muon is one of the most precisely measured quantities in experimental particle physics as mentioned in this paper, and its latest measurement at Brookhaven National Laboratory deviates from the Standard Model expectation by approximately 3.5 standard deviations.
Abstract: The anomalous magnetic moment of the muon is one of the most precisely measured quantities in experimental particle physics. Its latest measurement at Brookhaven National Laboratory deviates from the Standard Model expectation by approximately 3.5 standard deviations. The goal of the new experiment, E989, now under construction at Fermilab, is a fourfold improvement in precision. Here, we discuss the details of the future measurement and its current status.
•
TL;DR: In this paper, the authors propose a class of finite population causal estimands that depend on conditional distributions of the potential outcomes, and provide an interpretable summary of causal effects when no scale is available.
Abstract: Many outcomes of interest in the social and health sciences, as well as in modern applications in computational social science and experimentation on social media platforms, are ordinal and do not have a meaningful scale. Causal analyses that leverage this type of data, termed ordinal non-numeric, require careful treatment, as much of the classical potential outcomes literature is concerned with estimation and hypothesis testing for outcomes whose relative magnitudes are well defined. Here, we propose a class of finite population causal estimands that depend on conditional distributions of the potential outcomes, and provide an interpretable summary of causal effects when no scale is available. We formulate a relaxation of the Fisherian sharp null hypothesis of constant effect that accommodates the scale-free nature of ordinal non-numeric data. We develop a Bayesian procedure to estimate the proposed causal estimands that leverages the rank likelihood. We illustrate these methods with an application to educational outcomes in the General Social Survey.
•
TL;DR: In this article, two world-renowned experts present statistical methods for studying causal in nature: what would happen to individuals, or to groups, if part of their environment were changed?
Abstract: Most questions in social and biomedical sciences are causal in nature: what would happen to individuals, or to groups, if part of their environment were changed? In this groundbreaking text, two world-renowned experts present statistical methods for studying such questions. This book starts with the notion of potential outcomes, each corresponding to the outcome that would be realized if a subject were exposed to a particular treatment or regime. In this approach, causal effects are comparisons of such potential outcomes. The fundamental problem of causal inference is that we can only observe one of the potential outcomes for a particular subject. The authors discuss how randomized experiments allow us to assess causal effects and then turn to observational studies. They lay out the assumptions needed for causal inference and describe the leading analysis methods, including matching, propensity-score methods, and instrumental variables. Many detailed applications are included, with special focus on practical aspects for the empirical researcher.
••
01 Apr 2015TL;DR: The Fisher Exact P-values (FEPs) as mentioned in this paper measure the effect of the treatment versus control treatment on the average treatment effect across all units in a randomized experiment.
Abstract: INTRODUCTION As discussed in Chapter 2, Fisher appears to have been the first to grasp fully the importance of physical randomization for credibly assessing causal effects (1925, 1936). A few years earlier, Neyman (1923) had introduced the language and the notation of potential outcomes, using this notation to define causal effects as if the assignments were determined by random draws from an urn, but he did not take the next logical step of appreciating the importance of actually randomizing. It was instead Fisher who made this leap. Given data from a completely randomized experiment, Fisher was intent on assessing the sharp null hypothesis (or exact null hypothesis , Fisher, 1935) of no effect of the active versus control treatment, that is, the null hypothesis under which, for each unit in the experiment, both values of the potential outcomes are identical. In this setting, Fisher developed methods for calculating “p-values.” We refer to them as Fisher Exact P-values (FEPs), although we use them more generally than Fisher originally proposed. Note that Fisher's null hypothesis of no effect of the treatment versus control whatsoever is distinct from the possibly more practical question of whether the typical (e.g., average) treatment effect across all units is zero. The latter is a weaker hypothesis, because the average treatment effect may be zero even when for some units the treatment effect is positive, as long as for some others the effect is negative. We discuss the testing of hypotheses on, and inference for, average treatment effects in Chapter 6. Under Fisher's null hypothesis, and under sharp null hypotheses more generally, for units with either potential outcome observed, the other potential outcome is known; and so, under such a sharp null hypothesis, both potential outcomes are “known” for each unit in the sample – being either directly observed or inferred through the sharp null hypothesis. Consider any test statistic T : a function of the stochastic assignment vector, W ; the observed outcomes, Y obs ; and any pre-treatment variables, X .
••
TL;DR: This article examined the possible consequences of a change in law school admissions in the United States from an affirmative action system based on race to one based on socioeconomic class and showed that class-based affirmative action is insufficient to maintain racial diversity in prestigious law schools.
Abstract: We examine the possible consequences of a change in law school admissions in the United States from an affirmative action system based on race to one based on socioeconomic class. Using data from the 1991-1996 Law School Admission Council Bar Passage Study, students were reassigned attendance by simulation to law school tiers by transferring the affirmative action advantage for black students to students from low socioeconomic backgrounds. The hypothetical academic outcomes for the students were then multiply-imputed to quantify the uncertainty of the resulting estimates. The analysis predicts dramatic decreases in the numbers of black students in top law school tiers, suggesting that class-based affirmative action is insufficient to maintain racial diversity in prestigious law schools. Furthermore, there appear to be no statistically significant changes in the graduation and bar passage rates of students in any demographic group. The results thus provide evidence that, other than increasing their representation in upper tiers, current affirmative action policies relative to a socioeconomic-based system neither substantially help nor harm minority academic outcomes, contradicting the predictions of the "mismatch" hypothesis, which asserts otherwise.
••
01 Apr 2015
TL;DR: In this article, the authors discuss three key notions underlying causal inference: potential outcomes, the utility of the related stability assumption, and the central role of the assignment mechanism, which is crucial for inferring causal effects.
Abstract: INTRODUCTION In this introductory chapter we set out our basic framework for causal inference. We discuss three key notions underlying our approach. The first notion is that of potential outcomes , each corresponding to one of the levels of a treatment or manipulation , following the dictum “no causation without manipulation” (Rubin, 1975, p. 238). Each of these potential outcomes is a priori observable, in the sense that it could be observed if the unit were to receive the corresponding treatment level. But, a posteriori , that is, once a treatment is applied, at most one potential outcome can be observed. Second, we discuss the necessity, when drawing causal inferences, of observing multiple units , and the utility of the related stability assumption, which we use throughout most of this book to exploit the presence of multiple units. Finally, we discuss the central role of the assignment mechanism , which is crucial for inferring causal effects, and which serves as the organizing principle for this book. POTENTIAL OUTCOMES In everyday life, causal language is widely used in an informal way. One might say: “My headache went away because I took an aspirin,” or “She got a good job last year because she went to college,” or “She has long hair because she is a girl.” Such comments are typically informed by observations on past exposures, for example, of headache outcomes after taking aspirin or not, or of characteristics of jobs of people with or without college educations, or the typical hair length of boys and girls. As such, these observations generally involve informal statistical analyses, drawing conclusions from associations between measurements of different quantities that vary from individual to individual, commonly called variables or random variables – language apparently first used by Yule (1897). Nevertheless, statistical theory has been relatively silent on questions of causality. Many, especially older, textbooks avoid any mention of the term other than in settings of randomized experiments.
••
01 Apr 2015TL;DR: In this article, a method for estimating causal effects given a regular assignment mechanism, based on subclassification on the estimated propensity score, is discussed, with the assumption of individualistic assigment and unconfoundedness.
Abstract: INTRODUCTION In this chapter we discuss a method for estimating causal effects given a regular assignment mechanism, based on subclassification on the estimated propensity score. We also refer to this method as blocking or stratification . Given the assumptions of individualistic assigment and unconfoundedness, the definition of the propensity score in Chapter 3 implies that the super-population propensity score equals the conditional probability of receiving the treatment given the observed covariates. As shown in Chapter 12, the propensity score is a member of a class of functions of the covariates, collectively called balancing scores , that share an important property: within subpopulations with the same value of a balancing score, the super-population distribution of the covariates is identical in the treated and control subpopulations. This, in turn, was shown to imply that, under the assumption of super-population unconfoundedness, systematic biases in comparisons of outcomes for treated and control units associated with observed covariates can be eliminated entirely by adjusting solely for differences between treated and control units on a balancing score. The practical relevance of this result stems from the fact that a balancing score may be of lower dimension than the original covariates. (By definition, the covariates themselves form a balancing score, but one that has no dimension reduction.) When a balancing score is of lower dimension than the full set of covariates, adjustments for differences in this balancing score may be easier to implement than adjusting for differences in all covariates, because it avoids high-dimensional considerations. Within the class of balancing scores, the propensity score, as well as strictly monotonic transformations of it (such as the linearized propensity score or log odds ratio), have a special place. All balancing scores b ( x ) satisfy the property that if for two covariate values x and x ′, b ( x ′) = b ( x ′), then it must be the case that e ( x ) = e ( x ′).
••
01 Apr 2015TL;DR: In this article, the authors assess the degree of overlap in the covariate distributions and assess the severity of the statistical challenge to adjust for the differences in covariates, in order to decide on the appropriate methods to estimate causal effects under the assumption of unconfoundedness.
Abstract: INTRODUCTION When a researcher wishes to proceed to estimate causal effects under the assumption of unconfoundedness, there are various statistical methods that can be used to attempt to adjust for differences in covariate distributions. These methods include simple linear regressions, which is adequate in simple situations. They also include more sophisticated methods involving subclassification on the propensity score and matching, the latter two possibly in combination with model-based imputation methods, which can work well even in complicated situations. In order to decide on the appropriate methods, it is important first to assess the severity of the statistical challenge to adjust for the differences in covariates. In other words, it is useful to assess how different the covariate distributions are in the treatment and control groups. If the covariate distributions are similar, as they would be, in expectation, in the setting of a completely randomized experiment, there is less reason to be concerned about the sensitivity of estimates to the specific method choosen than if these distributions are substantially different. On the other hand, even if unconfoundedness holds, it may be that there are regions of the covariate space with relatively few treated units or relatively few control units, and, as a result, inferences for such regions rely largely on extrapolation and are therefore less credible than inferences for regions with substantial overlap in covariate distributions. In this chapter we address the problem of assessing the degree of overlap in the covariate distributions – or, in other words, the covariate balance between the treated and control samples prior to any analyses to adjust for these differences. These assessments do not involve the outcome data and therefore do not introduce any systematic biases in subsequent analyses. In principle we are interested in the comparison of two multivariate distributions, the distributions of the covariates in the treated and control subsamples. We wish to explore how different the measures of central tendency are, and how much overlap there is in the tails of the distributions.
••
TL;DR: This article examined the possible consequences of a change in law school admissions in the United States from an affirmative action system based on race to one based on socioeconomic class and showed that class-based affirmative action is insufficient to maintain racial diversity in prestigious law schools.
Abstract: We examine the possible consequences of a change in law school admissions in the United States from an affirmative action system based on race to one based on socioeconomic class. Using data from the 1991–1996 Law School Admission Council Bar Passage Study, students were reassigned attendance by simulation to law school tiers by transferring the affirmative action advantage for black students to students from low socioeconomic backgrounds. The hypothetical academic outcomes for the students were then multiply-imputed to quantify the uncertainty of the resulting estimates. The analysis predicts dramatic decreases in the numbers of black students in top law school tiers, suggesting that class-based affirmative action is insufficient to maintain racial diversity in prestigious law schools. Furthermore, there appear to be no statistically significant changes in the graduation and bar passage rates of students in any demographic group. The results thus provide evidence that, other than increasing their representation in upper tiers, current affirmative action policies relative to a socioeconomic-based system neither substantially help nor harm minority academic outcomes, contradicting the predictions of the “mismatch” hypothesis, which asserts otherwise.
••
01 Jan 2015
••
01 Apr 2015TL;DR: In this article, the authors discuss a second approach to analyzing causal effects when unconfoundedness of the treatment of interest is questionable, in which they consider alternative assumptions regarding causal effects.
Abstract: INTRODUCTION In this chapter we discuss a second approach to analyzing causal effects when unconfoundedness of the treatment of interest is questionable. In Chapter 22 we also relaxed the unconfoundedness assumption, but there we did not make any additional assumptions. The resulting sensitivity and bounds analyses led to a range of estimated values for treatment effects, all of which were consistent with the observed data. Instead, in this chapter we consider alternatives to the standard unconfoundedness assumption that still allow us to obtain essentially unbiased point estimates of some treatment effects of interest, although typically not the overall average effect. In the settings we consider, there is, on substantive grounds, reason to believe that units receiving and units not receiving the treatment of interest are systematically different in characteristics associated with the potential outcomes. Such cases may arise if receipt of treatment is partly the result of deliberate choices by units, choices that take into account perceptions or expectations of the causal effects of the treatment based on information that the analyst may not observe. In order to allow for such violations of unconfoundedness, we rely on the presence of additional information and consider alternative assumptions regarding causal effects. More specifically, a key feature of the Instrumental Variables (IV) approach, the topic of the current chapter and the next two, is the presence of a secondary treatment, in the current setting the assignment to treatment instead of the receipt of treatment, where by “secondary” we do not mean temporily but secondary in terms of scientific interest. This secondary treatment is assumed to be unconfounded. In fact, in the randomized experiment setting of the current chapter, the assignment to treatment is unconfounded by design. This implies we can, using the methods from Part II of the book, unbiasedly estimate causal effects of the assignment to treatment. The problem is that these causal effects are not the causal effects of primary interest, which are the effects of the receipt of treatment. Assumptions that allow researchers to link these causal effects are at the core of the instrumental variables approach.
•
TL;DR: In this article, the authors proposed to use rerandomization to ensure covariate balance in factorial designs, where a large number of pre-treatment covariates are present, and balance among covariates across treatment groups should be ensured.
Abstract: Factorial designs are widely used in agriculture, engineering, and the social sciences to study the causal effects of several factors simultaneously on a response. The objective of such a design is to estimate all factorial effects of interest, which typically include main effects and interactions among factors. To estimate factorial effects with high precision when a large number of pre-treatment covariates are present, balance among covariates across treatment groups should be ensured. We propose utilizing rerandomization to ensure covariate balance in factorial designs. Although both factorial designs and rerandomization have been discussed before, the combination has not. Here, theoretical properties of rerandomization for factorial designs are established, and empirical results are explored using an application from the New York Department of Education.
••
TL;DR: Rubin's Model for Causal Inference helps us design experiments to measure the effects of possible causes as discussed by the authors. But in the practical world, more complicated than the one evoked in the proposed happiness study, Rubin's Model is even more useful.
Abstract: Rubin’s Model for Causal Inference helps us design experiments to measure the effects of possible causes. In a 2014 CHANCE column, this was illustrated with a hypothetical experiment on how to unravel a causal puzzle of happiness. Is it really this easy? The short answer is, unfortunately, no. But in the practical world, more complicated than the one evoked in the proposed happiness study, Rubin’s Model is even more useful. In this article we shall go deeper into the dimly lit practical world, where participants in our causal experiment drop out for reasons outside of our control. We will show how statistical thinking in general, and Rubin’s Model in particular, can illuminate it. But let us go slowly, and so allow time for our eyes to acclimate to the darkness. Controlled experimental studies, are typically referred to as the gold standard for which all investigators should strive, and observational studies as their polar opposite, are pejoratively described as “some data we found lying on the street.” In practice they are closer to one another than we are often willing to admit. The distinguished statistician Paul Holland, expanding on Robert Burns, observed that,
••
TL;DR: This work clarifies the source of partially post hoc subgroup analyses' invalidity, proposes a randomization-based approach for generating valid posterior predictive p-values, and investigates the approach's operating characteristics in a simple illustrative setting, showing that it can have desirable properties under both null and alternative hypotheses.
Abstract: By 'partially post-hoc' subgroup analyses, we mean analyses that compare existing data from a randomized experiment-from which a subgroup specification is derived-to new, subgroup-only experimental data. We describe a motivating example in which partially post hoc subgroup analyses instigated statistical debate about a medical device's efficacy. We clarify the source of such analyses' invalidity and then propose a randomization-based approach for generating valid posterior predictive p-values for such partially post hoc subgroups. Lastly, we investigate the approach's operating characteristics in a simple illustrative setting through a series of simulations, showing that it can have desirable properties under both null and alternative hypotheses.
••
01 Apr 2015TL;DR: In this paper, the Neyman sampling variance estimator was used to analyze the within-stratum estimates in pairwise randomized experiments, where each stratum contains exactly two units, with one randomly selected to be assigned to the treatment group, and the other one assigned to control group.
Abstract: INTRODUCTION In the previous chapter we analyzed stratified randomized experiments, where a sample of size N was partitioned into J strata, and within each stratum a completely randomized experiment was conducted. In this chapter we consider a special case of the stratified randomized experiment. Each stratum contains exactly two units, with one randomly selected to be assigned to the treatment group, and the other one assigned to the control group. Such a design is known as a pairwise randomized experiment or paired comparison . Although this can be viewed simply as a special case of a stratified randomized experiment, there are two features of this design that warrant special attention. First, the fact that there is only a single unit in each treatment group in each stratum (or pair in this case) implies that the Neyman sampling variance estimator that we discussed in the chapters on completely randomized experiments (Chapter 6) and stratified randomized experiments (Chapter 9) cannot be used; that estimator requires the presence of at least two units assigned to each treatment in each stratum. Second, each stratum has the same proportion of treated units, which allows us to analyze the within-stratum estimates symmetrically; the natural estimator for the average treatment effect weights each stratum equally. As in the case of stratified randomized experiments, the motivation for eliminating some of the possible assignments in pairwise randomized experiments is that a priori those values of the assignment vectors that are eliminated are expected to lead to less informative inferences. This argument relies on the within-pair variation in potential outcomes being small relative to the between-pair variation. Often the assignment to pairs is based on covariates. Units are matched to other units based on their similarity in these covariates, with the expectation that this similarity corresponds to similarity in the potential outcomes under each treatment. Suppose, for example, that the treatment is an expensive surgical procedure for a relatively common medical condition. It may not be financially feasible to apply the treatment to many individuals.