scispace - formally typeset
Search or ask a question
Author

Donald B. Rubin

Other affiliations: University of Chicago, Harvard University, Princeton University  ...read more
Bio: Donald B. Rubin is an academic researcher from Tsinghua University. The author has contributed to research in topics: Causal inference & Missing data. The author has an hindex of 132, co-authored 515 publications receiving 262632 citations. Previous affiliations of Donald B. Rubin include University of Chicago & Harvard University.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel design is proposed that obtains and uses information on an additional key variable-a treatment or externally controlled variable, which if set at its "effective" level, could have prevented the death of those who died.
Abstract: We consider studies of cohorts of individuals after a critical event, such as an injury, with the following characteristics. First, the studies are designed to measure "input" variables, which describe the period before the critical event, and to characterize the distribution of the input variables in the cohort. Second, the studies are designed to measure "output" variables, primarily mortality after the critical event, and to characterize the predictive (conditional) distribution of mortality given the input variables in the cohort. Such studies often possess the complication that the input data are missing for those who die shortly after the critical event because the data collection takes place after the event. Standard methods of dealing with the missing inputs, such as imputation or weighting methods based on an assumption of ignorable missingness, are known to be generally invalid when the missingness of inputs is nonignorable, that is, when the distribution of the inputs is different between those who die and those who live. To address this issue, we propose a novel design that obtains and uses information on an additional key variable-a treatment or externally controlled variable, which if set at its "effective" level, could have prevented the death of those who died. We show that the new design can be used to draw valid inferences for the marginal distribution of inputs in the entire cohort, and for the conditional distribution of mortality given the inputs, also in the entire cohort, even under nonignorable missingness. The crucial framework that we use is principal stratification based on the potential outcomes, here mortality under both levels of treatment. We also show using illustrative preliminary injury data that our approach can reveal results that are more reasonable than the results of standard methods, in relatively dramatic ways. Thus, our approach suggests that the routine collection of data on variables that could be used as possible treatments in such studies of inputs and mortality should become common.

64 citations

Journal ArticleDOI
TL;DR: It is argued that the only reason a first successful randomized clinical trial (RCT) must be replicated under the 2-trial paradigm is to rule out nonreplicable causes of the success of the first RCT and thereby stimulate discussion on what is believed to be one of the most important issues in drug regulation, raised anew by section 115a, namely, the evidentiary standards appropriate for concluding efficacy.
Abstract: In 1997, President Clinton signed the Food and Drug Administration Modernization Act of 1997 (FDAMA). Among its many provisions, section 115a amended the Federal Food, Drug, and Cosmetics Act to permit determination of substantial evidence of effectiveness as required for approval of a new drug to be based on “data from one adequate and well-controlled investigation and confirmatory evidence.”* This language contrasts to the statute’s previous wording, introduced in the 1962 amendment, that required “adequate and well controlled investigation s” (note plural, emphasis added) and interpreted by the US Food and Drug Administration to require (at least) 2 such trials. Exactly how the new mandate of FDAMA 1997 should be interpreted has since been a matter of as yet unresolved debate (see, for example, Peck and Wechsler ). The purposes of this commentary are to review the historical basis for the new amendment, present a logical framework for it, and propose a direction its implementation could take. We do this not so much to propose definitive policy as to stimulate discussion on what we believe to be one of the most important issues in drug regulation, raised anew by section 115a, namely, the evidentiary standards appropriate for concluding efficacy. We argue that the only reason a first successful randomized clinical trial (RCT) must be replicated under the 2-trial paradigm is to rule out nonreplicable causes of the success of the first RCT and thereby From the Center for Drug Development Science, Office of the As sociate Dean for Clinical Research, Georgetown University School of Medicine, Washington; Department of Statistics, Harvard Uni versity, Cambridge; and Departments of Laboratory Medicine and Biopharmaceutical Sciences, Schools of Medicine and Pharmacy, University of California, San Francisco. Received for publication Dec 27, 2001; accepted Jan 6, 2003. Reprint requests: Lewis B. Sheiner, Box 0626, UCSF, San Francisco, CA 94143-0626. *Specifically, paragraph 505(d) of the Food, Drug, and Cosmetic act was modified (in italics) as follows: “The term ’substantial evidence’ means evidence consisting of adequate and well-controlled investigations, including clinical investigations...on the basis of which it could fairly and responsibly be concluded...that the drug will have the effect it purports or is represented to have... if the secretary determines, based on relevant science, that data from one adequate and well-controlled clinical investigation and confirmatory evidence (obtained prior to or after such investigation) are sufficient to establish effectiveness, the secretary may consider such data and evidence to constitute substantial evidence. . . .” Clin Pharmacol Ther 2003;73:481-90. Copyright © 2003 by the American Society for Clinical Pharmacology & Therapeutics. 0009-9236/2003/$30.00 0 doi:10.1016/S0009-9236(03)00018-3

63 citations

Journal ArticleDOI
TL;DR: As the "father" of multiple imputation (MI), Soren Nielsen as mentioned in this paper has received a great deal of attention in the last few years and has been referred to as the 'father' of MI.
Abstract: As the "father" of multiple imputation (MI), it gives me great pleasure to be able to comment on this collection of contributions on MI. The nice review by Paul Zhang serves as an excellent introduction to the more critical attention lavished on MI by Soren Nielsen and the extensive discussion by Xiao-Li Meng and Martin Romero. I have a few comments on this package, which are designed to clarify a few points and supplement other points from my "applied statistician's" perspective. My focus in the following is more on Nielsen's article because the expressed views are less consistent with my own than the contributions of the other authors. Nevertheless, despite differences of emphasis, I want to express my sincere gratitude to Nielsen for bringing his technical adroitness to address the issue of multiple imputation, in particular, and the problem of missing data in general (e.g., Nielsen, 1997, 2000).

61 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.
Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

50,607 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Journal ArticleDOI
TL;DR: This paper examines eight published reviews each reporting results from several related trials in order to evaluate the efficacy of a certain treatment for a specified medical condition and suggests a simple noniterative procedure for characterizing the distribution of treatment effects in a series of studies.

33,234 citations

Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations