Home
/
Authors
/
Donald B. Rubin

Author

Donald B. Rubin

Other affiliations: University of Chicago, Harvard University, Princeton University ...read more

Bio: Donald B. Rubin is an academic researcher from Tsinghua University. The author has contributed to research in topics: Causal inference & Missing data. The author has an hindex of 132, co-authored 515 publications receiving 262632 citations. Previous affiliations of Donald B. Rubin include University of Chicago & Harvard University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1967

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Markov chain Monte Carlo methods in biostatistics

[...]

Andrew Gelman¹, Donald B. Rubin²•Institutions (2)

Columbia University¹, Harvard University²

01 Dec 1996-Statistical Methods in Medical Research

TL;DR: Concerns with implementation should not deter the biostatistician from using MCMC methods, but rather help to ensure wise use of these powerful techniques.

...read moreread less

Abstract: Appropriate models in biostatistics are often quite complicated. Such models are typically most easily fit using Bayesian methods, which can often be implemented using simulation techniques. Markov chain Monte Carlo (MCMC) methods are an important set of tools for such simulations. We give an overview and references of this rapidly emerging technology along with a relatively simple example. MCMC techniques can be viewed as extensions of iterative maximization techniques, but with random jumps rather than maximizations at each step. Special care is needed when implementing iterative maximization procedures rather than closed-form methods, and even more care is needed with iterative simulation procedures: it is substantially more difficult to monitor convergence to a distribution than to a point. The most reliable implementations of MCMC build upon results from simpler models fit using combinations of maximization algorithms and noniterative simulations, so that the user has a rough idea of the location and scale of the posterior distribution of the quantities of interest under the more complicated model. These concerns with implementation, however, should not deter the biostatistician from using MCMC methods, but rather help to ensure wise use of these powerful techniques.

...read moreread less

208 citations

Journal Article•DOI•

Criminality in XYY and XXY men.

[...]

Herman A. Witkin¹, Sarnoff A. Mednick, Fini Schulsinger, Eskild Bakkestrom, Karl O. Christiansen², D. R. Goodenough¹, Kurt Hirschhorn³, Claes Lundsteen, David R. Owen³, John Philip, Donald B. Rubin¹, Martha L. Stocking¹ - Show less +8 more•Institutions (3)

Princeton University¹, University of Copenhagen², City University of New York³

13 Aug 1976-Science

205 citations

Journal Article•DOI•

Causal Inference Through Potential Outcomes and Principal Stratification: Application to Studies with “Censoring” Due to Death

[...]

Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Aug 2006-Statistical Science

TL;DR: In this paper, the authors use potential out-comes to define causal effects, followed by principal stratification on the intermediated outcomes (e.g., survival) and conclude that causal inference is best understood using potential outcomes.

...read moreread less

Abstract: Causal inference is best understood using potential out- comes. This use is particularly important in more complex settings, that is, observational studies or randomized experiments with compli- cations such as noncompliance. The topic of this lecture, the issue of estimating the causal effect of a treatment on a primary outcome that is "censored" by death, is another such complication. For example, sup- pose that we wish to estimate the effect of a new drug on Quality of Life (QOL) in a randomized experiment, where some of the patients die before the time designated for their QOL to be assessed. Another example with the same structure occurs with the evaluation of an ed- ucational program designed to increase final test scores, which are not defined for those who drop out of school before taking the test. A fur- ther application is to studies of the effect of job-training programs on wages, where wages are only defined for those who are employed. The analysis of examples like these is greatly clarified using potential out- comes to define causal effects, followed by principal stratification on the intermediated outcomes (e.g., survival).

...read moreread less

198 citations

Journal Article•DOI•

Characterizing the effect of matching using linear propensity score methods with normal distributions

[...]

Donald B. Rubin¹, Neal Thomas•Institutions (1)

Harvard University¹

01 Dec 1992-Biometrika

TL;DR: This paper showed that matching on estimated rather than population propensity scores can lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible.

...read moreread less

Abstract: SUMMARY Matched sampling is a standard technique for controlling bias in observational studies due to specific covariates. Since Rosenbaum & Rubin (1983), multivariate matching methods based on estimated propensity scores have been used with increasing frequency in medical, educational, and sociological applications. We obtain analytic expressions for the effect of matching using linear propensity score methods with normal distributions. These expressions cover cases where the propensity score is either known, or estimated using either discriminant analysis or logistic regression, as is typically done in current practice. The results show that matching using estimated propensity scores not only reduces bias along the population propensity score, but also controls variation of components orthogonal to it. Matching on estimated rather than population propensity scores can therefore lead to relatively large variance reduction, as much as a factor of two in common matching settings where close matches are possible. Approximations are given for the magnitude of this variance reduction, which can be computed using estimates obtained from the matching pools. Related expressions for bias reduction are also presented which suggest that, in difficult matching situations, the use of population scores leads to greater bias reduction than the use of estimated scores.

...read moreread less

197 citations

Journal Article•DOI•

Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression

[...]

Clifford C. Clogg¹, Donald B. Rubin², Nathaniel Schenker³, Bradley D. Schultz⁴, Lynn Weidman - Show less +1 more•Institutions (4)

Pennsylvania State University¹, Harvard University², University of California, Los Angeles³, United States Environmental Protection Agency⁴

01 Mar 1991-Journal of the American Statistical Association

TL;DR: In this article, the authors describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems, and show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project.

...read moreread less

Abstract: We describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems. This project represents the most extensive application of multiple imputation to date, and the modeling effort was considerable as well—hundreds of logistic regressions were estimated. One goal of this article is to summarize the strategies used in the project so that researchers can better understand how the new data bases were created. Another goal is to show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project. To multiply-impute 1980 census-comparable codes for industries and occupations in two 1970 census public-use samples, logistic regression models were estimated with flattening constants. For many of the regression models considered, the data were too sparse to support conventional maximum likelihood analysis, so some alternative had to be employed. These methods solve existence and ...

...read moreread less

197 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
…
20
21
22
23
24
25
26
…
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Fitting Linear Mixed-Effects Models Using lme4

[...]

Douglas M. Bates, Martin Mächler, Benjamin M. Bolker, Steven C. Walker

07 Oct 2015-Journal of Statistical Software

TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.

...read moreread less

Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

...read moreread less

50,607 citations

Journal Article•DOI•

Maximum likelihood from incomplete data via the EM algorithm

[...]

Arthur P. Dempster¹, Nan M. Laird¹, Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Sep 1977-Journal of the royal statistical society series b-methodological

49,597 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Journal Article•DOI•

Meta-Analysis in Clinical Trials*

[...]

Rebecca DerSimonian¹, Rebecca DerSimonian², Nan M. Laird², Nan M. Laird¹•Institutions (2)

Yale University¹, Harvard University²

01 Sep 1986-Controlled Clinical Trials

TL;DR: This paper examines eight published reviews each reporting results from several related trials in order to evaluate the efficacy of a certain treatment for a specified medical condition and suggests a simple noniterative procedure for characterizing the distribution of treatment effects in a series of studies.

...read moreread less

33,234 citations

Journal Article•DOI•

Latent dirichlet allocation

[...]

David M. Blei¹, Andrew Y. Ng², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Stanford University²

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

...read moreread less

30,570 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse