scispace - formally typeset
Search or ask a question
Author

Donald B. Rubin

Other affiliations: University of Chicago, Harvard University, Princeton University  ...read more
Bio: Donald B. Rubin is an academic researcher from Tsinghua University. The author has contributed to research in topics: Causal inference & Missing data. The author has an hindex of 132, co-authored 515 publications receiving 262632 citations. Previous affiliations of Donald B. Rubin include University of Chicago & Harvard University.


Papers
More filters
Journal ArticleDOI
TL;DR: Concerns with implementation should not deter the biostatistician from using MCMC methods, but rather help to ensure wise use of these powerful techniques.
Abstract: Appropriate models in biostatistics are often quite complicated. Such models are typically most easily fit using Bayesian methods, which can often be implemented using simulation techniques. Markov chain Monte Carlo (MCMC) methods are an important set of tools for such simulations. We give an overview and references of this rapidly emerging technology along with a relatively simple example. MCMC techniques can be viewed as extensions of iterative maximization techniques, but with random jumps rather than maximizations at each step. Special care is needed when implementing iterative maximization procedures rather than closed-form methods, and even more care is needed with iterative simulation procedures: it is substantially more difficult to monitor convergence to a distribution than to a point. The most reliable implementations of MCMC build upon results from simpler models fit using combinations of maximization algorithms and noniterative simulations, so that the user has a rough idea of the location and scale of the posterior distribution of the quantities of interest under the more complicated model. These concerns with implementation, however, should not deter the biostatistician from using MCMC methods, but rather help to ensure wise use of these powerful techniques.

208 citations

Journal ArticleDOI
TL;DR: In this paper, the authors use potential out-comes to define causal effects, followed by principal stratification on the intermediated outcomes (e.g., survival) and conclude that causal inference is best understood using potential outcomes.
Abstract: Causal inference is best understood using potential out- comes. This use is particularly important in more complex settings, that is, observational studies or randomized experiments with compli- cations such as noncompliance. The topic of this lecture, the issue of estimating the causal effect of a treatment on a primary outcome that is "censored" by death, is another such complication. For example, sup- pose that we wish to estimate the effect of a new drug on Quality of Life (QOL) in a randomized experiment, where some of the patients die before the time designated for their QOL to be assessed. Another example with the same structure occurs with the evaluation of an ed- ucational program designed to increase final test scores, which are not defined for those who drop out of school before taking the test. A fur- ther application is to studies of the effect of job-training programs on wages, where wages are only defined for those who are employed. The analysis of examples like these is greatly clarified using potential out- comes to define causal effects, followed by principal stratification on the intermediated outcomes (e.g., survival).

198 citations

Journal ArticleDOI
TL;DR: This paper showed that matching on estimated rather than population propensity scores can lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible.
Abstract: SUMMARY Matched sampling is a standard technique for controlling bias in observational studies due to specific covariates. Since Rosenbaum & Rubin (1983), multivariate matching methods based on estimated propensity scores have been used with increasing frequency in medical, educational, and sociological applications. We obtain analytic expressions for the effect of matching using linear propensity score methods with normal distributions. These expressions cover cases where the propensity score is either known, or estimated using either discriminant analysis or logistic regression, as is typically done in current practice. The results show that matching using estimated propensity scores not only reduces bias along the population propensity score, but also controls variation of components orthogonal to it. Matching on estimated rather than population propensity scores can therefore lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible. Approximations are given for the magnitude of this variance reduction, which can be computed using estimates obtained from the matching pools. Related expressions for bias reduction are also presented which suggest that, in difficult matching situations, the use of population scores leads to greater bias reduction than the use of estimated scores.

197 citations

Journal ArticleDOI
TL;DR: In this article, the authors describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems, and show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project.
Abstract: We describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems. This project represents the most extensive application of multiple imputation to date, and the modeling effort was considerable as well—hundreds of logistic regressions were estimated. One goal of this article is to summarize the strategies used in the project so that researchers can better understand how the new data bases were created. Another goal is to show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project. To multiply-impute 1980 census-comparable codes for industries and occupations in two 1970 census public-use samples, logistic regression models were estimated with flattening constants. For many of the regression models considered, the data were too sparse to support conventional maximum likelihood analysis, so some alternative had to be employed. These methods solve existence and ...

197 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.
Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

50,607 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Journal ArticleDOI
TL;DR: This paper examines eight published reviews each reporting results from several related trials in order to evaluate the efficacy of a certain treatment for a specified medical condition and suggests a simple noniterative procedure for characterizing the distribution of treatment effects in a series of studies.

33,234 citations

Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations