scispace - formally typeset
Search or ask a question

Showing papers by "Donald B. Rubin published in 2007"


Journal ArticleDOI
TL;DR: The theoretical perspective underlying this position will be presented followed by a particular application in the context of the US tobacco litigation that uses propensity score methods to create subgroups of treated units and control units who are at least as similar with respect to their distributions of observed background characteristics as if they had been randomized.
Abstract: For estimating causal effects of treatments, randomized experiments are generally considered the gold standard. Nevertheless, they are often infeasible to conduct for a variety of reasons, such as ethical concerns, excessive expense, or timeliness. Consequently, much of our knowledge of causal effects must come from non-randomized observational studies. This article will advocate the position that observational studies can and should be designed to approximate randomized experiments as closely as possible. In particular, observational studies should be designed using only background information to create subgroups of similar treated and control units, where 'similar' here refers to their distributions of background variables. Of great importance, this activity should be conducted without any access to any outcome data, thereby assuring the objectivity of the design. In many situations, this objective creation of subgroups of similar treated and control units, which are balanced with respect to covariates, can be accomplished using propensity score methods. The theoretical perspective underlying this position will be presented followed by a particular application in the context of the US tobacco litigation. This application uses propensity score methods to create subgroups of treated units (male current smokers) and control units (male never smokers) who are at least as similar with respect to their distributions of observed background characteristics as if they had been randomized. The collection of these subgroups then 'approximate' a randomized block experiment with respect to the observed covariates.

1,028 citations



Journal ArticleDOI
TL;DR: Finite mixture modeling was used as a technique for discerning latent structure and the laboratory-measured endophenotypes of sustained attention deficits and eye-tracking dysfunction as endophenotype indexes supported the existence of 2 relatively distinct latent classes.
Abstract: Prior research has focused on the latent structure of endophenotypic markers of schizophrenia liability, or schizotypy. The work supports the existence of 2 relatively distinct latent classes and derives largely from the taxometric analysis of psychometric values. The present study used finite mixture modeling as a technique for discerning latent structure and the laboratory-measured endophenotypes of sustained attention deficits and eye-tracking dysfunction as endophenotype indexes. In a large adult community sample (N = 311), finite mixture analysis of the sustained attention index d' and 2 eye-tracking indexes (gain and catch-up saccade rate) revealed evidence for 2 latent components. A putative schizotypy class accounted for 27% of the sample. A supplementary maximum covariance taxometric analysis yielded highly consistent results. Subjects in the schizotypy component displayed higher rates of schizotypal personality features and an increased rate of treated schizophrenia in their 1st-degree biological relatives compared with subjects in the other component. Implications of these results are examined in light of major theories of schizophrenia liability, and methodological advantages of finite mixture modeling for psychopathology research, with particular emphasis on genomic issues, are discussed.

78 citations


Journal ArticleDOI
TL;DR: A novel design is proposed that obtains and uses information on an additional key variable-a treatment or externally controlled variable, which if set at its "effective" level, could have prevented the death of those who died.
Abstract: We consider studies of cohorts of individuals after a critical event, such as an injury, with the following characteristics. First, the studies are designed to measure "input" variables, which describe the period before the critical event, and to characterize the distribution of the input variables in the cohort. Second, the studies are designed to measure "output" variables, primarily mortality after the critical event, and to characterize the predictive (conditional) distribution of mortality given the input variables in the cohort. Such studies often possess the complication that the input data are missing for those who die shortly after the critical event because the data collection takes place after the event. Standard methods of dealing with the missing inputs, such as imputation or weighting methods based on an assumption of ignorable missingness, are known to be generally invalid when the missingness of inputs is nonignorable, that is, when the distribution of the inputs is different between those who die and those who live. To address this issue, we propose a novel design that obtains and uses information on an additional key variable-a treatment or externally controlled variable, which if set at its "effective" level, could have prevented the death of those who died. We show that the new design can be used to draw valid inferences for the marginal distribution of inputs in the entire cohort, and for the conditional distribution of mortality given the inputs, also in the entire cohort, even under nonignorable missingness. The crucial framework that we use is principal stratification based on the potential outcomes, here mortality under both levels of treatment. We also show using illustrative preliminary injury data that our approach can reveal results that are more reasonable than the results of standard methods, in relatively dramatic ways. Thus, our approach suggests that the routine collection of data on variables that could be used as possible treatments in such studies of inputs and mortality should become common.

64 citations


Journal ArticleDOI
TL;DR: As the "father" of multiple imputation (MI), Soren Nielsen as mentioned in this paper has received a great deal of attention in the last few years and has been referred to as the 'father' of MI.
Abstract: As the "father" of multiple imputation (MI), it gives me great pleasure to be able to comment on this collection of contributions on MI. The nice review by Paul Zhang serves as an excellent introduction to the more critical attention lavished on MI by Soren Nielsen and the extensive discussion by Xiao-Li Meng and Martin Romero. I have a few comments on this package, which are designed to clarify a few points and supplement other points from my "applied statistician's" perspective. My focus in the following is more on Nielsen's article because the expressed views are less consistent with my own than the contributions of the other authors. Nevertheless, despite differences of emphasis, I want to express my sincere gratitude to Nielsen for bringing his technical adroitness to address the issue of multiple imputation, in particular, and the problem of missing data in general (e.g., Nielsen, 1997, 2000).

61 citations


01 Jan 2007
TL;DR: Using robust matching methods for making causal inferences from survey data, the authors demonstrate that there are profound differences between how voters behave in advanced democracies versus how they behave in new electoral democracies.
Abstract: Using new robust matching methods for making causal inferences from survey data, I demonstrate that there are profound differences between how voters behave in advanced democracies versus how they behave in new electoral democracies. The problems of voter ignorance and inattentiveness are not as serious in advanced democracies as many analysts have suggested but are of grave concern in new democracies. Citizens in advanced democracies are able to accomplish something that citizens in fledgling democracies are not: inattentive and poorly informed citizens are able to vote like their better informed compatriots and hence need to pay little attention to political events such as election campaigns in order to vote as if they were attentive. The results from the U.S. (which rely on various National Election Studies) and Mexico (2000 Panel Study) are reported in detail. Results from other countries are briefly reported. “The people should have as little to do as may be about the government. They lack information and are constantly liable to be misled.” —Roger Sherman, June 7, 1787 at the Federal Constitutional Convention (Collier 1971) “In a crowd men always tend to the same level, and, on general questions, a vote, recorded by forty academicians is no better than that of forty water-carriers.” —Gustave Le Bon, The Crowd (1896, 200)

45 citations


01 Jan 2007
TL;DR: This chapter focuses on how to design observational studies using matching methods and the related ideas of subclassification and weighting and presents practical guidance regarding the use of matching methods, as well as examples of their use and evidence of their improved performance.
Abstract: Much research in the social sciences attempts to estimate the effect of some intervention or “treatment” such as a school dropout prevention program or television watching. However, particularly in the social sciences, it is generally not possible to randomly assign units to receive the treatment condition or the control condition, and thus the resulting data are observational, where we simply observe that some units received the treatment and others did not. In such cases, there is a need to control for differences in the covariate distributions between the treatment and control groups. Matching methods, such as propensity score matching, effect this control by selecting subsets of the treatment and control groups with similar covariate distributions. The overall theme is of replicating a randomized experiment in two ways: first, by comparing treated and control units who look as if they could have been randomly assigned to treatment or control status; and second, by forming the comparison groups without the use of the outcome, thus preventing intentional or unintentional bias in selecting a particular sample to achieve a desired result. This chapter focuses on how to design observational studies using matching methods and the related ideas of subclassification and weighting. We present practical guidance regarding the use of matching methods, as well as examples of their use and evidence of their improved performance relative to other methods of controlling for bias due to observed covariates.

43 citations


Book ChapterDOI
TL;DR: The authors provided an overview of the approach to the estimation of such causal effects based on the concept of potential outcomes and discussed randomization-based approaches and the Bayesian posterior predictive approach.
Abstract: A central problem in epidemiology and medical statistics is how to draw inferences about the causal effects of treatments (ie, interventions) from randomized and nonrandomized data For example, does the new drug really reduce heart disease, or does exposure to that chemical in drinking water increase cancer rates relative to drinking water without that chemical? This chapter provides an overview of the approach to the estimation of such causal effects based on the concept of potential outcomes We discuss randomization-based approaches and the Bayesian posterior predictive approach

35 citations


Journal ArticleDOI
TL;DR: It is shown that diagnostics for confounding can be devised under reasonable assumptions and demonstrated the similarity of the true PK/PD relationships of adults and children on adjunctive therapy in order to support the approval of oxcarbazepine monotherapy in children by a bridging argument.
Abstract: One type of pharmacokinetic/pharmacodynamic (PK/PD) relationship that is used to characterize the therapeutic action of a drug is the relationship between some univariate summary of the plasma-concentration-versus-time profile and the drug effect on a response outcome. Operationally, such a relationship may be observed in a large clinical trial where randomly sampled patients are randomized to different values of the concentration summary. If, under such conditions, the relationship between concentration and effect does not depend on the dose needed to attain the target concentration, such a relationship will be called a true PK/PD relationship. When the true PK/PD relationship is assessed as an object of estimation in a dose-controlled clinical trial (i.e. when dose is randomized), observed drug concentration is an outcome variable. The estimated PK/PD relationship between observed outcome and observed concentration, which we then refer to as the conventional PK/PD relationship, may be biased for the true PK/PD relationship. Because of this bias, the conventional relationship is called confounded for the true one. We show that diagnostics for confounding can be devised under reasonable assumptions. We then apply these diagnostics to PK/PD assessments of adults and children on oxcarbazepine adjunctive therapy. It was necessary to demonstrate the similarity of the true PK/PD relationships of adults and children on adjunctive therapy in order to support the approval of oxcarbazepine monotherapy in children by a bridging argument.

26 citations


Book ChapterDOI
TL;DR: This focus is on MI, which is a statistically valid strategy for handling missing data, although other less sound methods are reviewed, as well as direct maximum likelihood and Bayesian methods for estimating parameters, which are also valid approaches.
Abstract: Missing data are a common problem in most epidemiological and medical studies, including surveys and clinical trials. Imputation, or filling in the missing values, is an intuitive and flexible way to handle the incomplete data sets that arise because of such missing data. Here, in addition to imputation, including multiple imputation (MI), we discuss several other strategies and their theoretical background, as well as present some examples and advice on computation. Our focus is on MI, which is a statistically valid strategy for handling missing data, although we review other less sound methods, as well as direct maximum likelihood and Bayesian methods for estimating parameters, which are also valid approaches. The analysis of a multiply-imputed data set is now relatively standard using readily available statistical software. The creation of multiply-imputed data sets is more challenging than their analysis but still straightforward relative to other valid methods of handling missing data, and we discuss available software for doing so. Ad hoc methods, including using singly-imputed data sets, almost always lead to invalid inferences and should be eschewed, especially when the focus is on valid interval estimation or testing hypotheses.

18 citations



01 Jan 2007
TL;DR: The authors used the concept of principal stratification to distinguish direct and indirect causal effects in a randomized experiment with only two treatment conditions, and showed that the resulting structure can be confusing even to great statisticians such as R.A. Fisher.
Abstract: Multivariate outcomes are common in studies for causal effects, but they are often mis-analyzed. The critical reason leading to their misuse is that even in a randomized experiment with only two treatment conditions, there are two potential outcomes associated with each measured outcome, but these can never be jointly observed. When there are two measured outcomes, one primary and another one that is intermediate in some sense, the resulting structure can therefore be very confusing, even to great statisticians such as R.A. Fisher. Two specific examples will be used to illustrate the issues using the concept of "principal stratification" (Frangakis and Rubin, 2002, Biometrics): the first on estimating dose-response when there is noncompliance, and the second on separating "direct" and "indirect" causal effects.