scispace - formally typeset
Search or ask a question
Author

Donald B. Rubin

Other affiliations: University of Chicago, Harvard University, Princeton University  ...read more
Bio: Donald B. Rubin is an academic researcher from Tsinghua University. The author has contributed to research in topics: Causal inference & Missing data. The author has an hindex of 132, co-authored 515 publications receiving 262632 citations. Previous affiliations of Donald B. Rubin include University of Chicago & Harvard University.


Papers
More filters
Journal ArticleDOI
TL;DR: Little evidence of an overall effect of ozone on the DNA methylome is found but some suggestive changes in PLSCR1, HCAR1, and LINC00336 DNA methylation after ozone exposure relative to clean air are observed.
Abstract: We used a randomized crossover experiment to estimate the effects of ozone (vs. clean air) exposure on genome-wide DNA methylation of target bronchial epithelial cells, using 17 volunteers, each randomly exposed on two separated occasions to clean air or 0.3-ppm ozone for two hours. Twenty-four hours after exposure, participants underwent bronchoscopy to collect epithelial cells whose DNA methylation was measured using the Illumina 450 K platform. We performed global and regional tests examining the ozone versus clean air effect on the DNA methylome and calculated Fisher-exact p-values for a series of univariate tests. We found little evidence of an overall effect of ozone on the DNA methylome but some suggestive changes in PLSCR1, HCAR1, and LINC00336 DNA methylation after ozone exposure relative to clean air. We observed some participant-to-participant heterogeneity in ozone responses.

10 citations

Journal ArticleDOI
TL;DR: An approach where the joint distribution of observed data and missing data are specified in a nonstandard way, and Tukey’s representation for exponential-family models is developed, a computationally tractable approach to inference in this class of models is proposed, and some general theoretical comments are offered.
Abstract: Data analyses typically rely upon assumptions about the missingness mechanisms that lead to observed versus missing data, assumptions that are typically unassessable. We explore an approach where the joint distribution of observed data and missing data are specified in a nonstandard way. In this formulation, which traces back to a representation of the joint distribution of the data and missingness mechanism, apparently first proposed by J. W. Tukey, the modeling assumptions about the distributions are either assessable or are designed to allow relatively easy incorporation of substantive knowledge about the problem at hand, thereby offering a possibly realistic portrayal of the data, both observed and missing. We develop Tukey's representation for exponential-family models, propose a computationally tractable approach to inference in this class of models, and offer some general theoretical comments. We then illustrate the utility of this approach with an example in systems biology.

10 citations

Journal ArticleDOI
TL;DR: An experimental design, randomization to randomization probabilities (R2R), is proposed, which significantly improves estimates of treatment effects under actual conditions of use by manipulating participant expectations about receiving treatment.
Abstract: Blinded randomized controlled trials (RCT) require participants to be uncertain if they are receiving a treatment or placebo. Although uncertainty is ideal for isolating the treatment effect from all other potential effects, it is poorly suited for estimating the treatment effect under actual conditions of intended use-when individuals are certain that they are receiving a treatment. We propose an experimental design, randomization to randomization probabilities (R2R), which significantly improves estimates of treatment effects under actual conditions of use by manipulating participant expectations about receiving treatment. In the R2R design, participants are first randomized to a value, π, denoting their probability of receiving treatment (vs. placebo). Subjects are then told their value of π and randomized to either treatment or placebo with probabilities π and 1-π, respectively. Analysis of the treatment effect includes statistical controls for π (necessary for causal inference) and typically a π-by-treatment interaction. Random assignment of subjects to π and disclosure of its value to subjects manipulates subject expectations about receiving the treatment without deception. This method offers a better treatment effect estimate under actual conditions of use than does a conventional RCT. Design properties, guidelines for power analyses, and limitations of the approach are discussed. We illustrate the design by implementing an RCT of caffeine effects on mood and vigilance and show that some of the actual effects of caffeine differ by the expectation that one is receiving the active drug. (PsycINFO Database Record

10 citations

Journal ArticleDOI
TL;DR: In this paper, the convergence of MCMC samples is discussed and a complete list of the commentaries on these articles is shown on the next page, along with a list of commentaries for each article.
Abstract: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to Statistical Science. This article is from a volume of Statistical Science (1992; 7(4)) on the convergence of MCMC samples. The two main articles are: A complete list of the commentaries on these articles is shown on the next page.

10 citations

Proceedings ArticleDOI
01 May 1995
TL;DR: The prototype superconducting cavity system for CESR-Phase III was tested in CESR in August 1994 as mentioned in this paper, and a maximum of 155 kW of rf power was transferred to a 120 mA beam.
Abstract: The prototype superconducting cavity system for CESR-Phase III was tested in CESR in August 1994. The performance of the system was very gratifying. The cavity operated gradients of 4.5-6 MV/m and accelerated beam currents up to 220 mA. This current is a factor of 3 above the world record 67 mA for SRF[1]. The high circulating beam current did not increase the heat load or present any danger to the cavity. No instability attributable to the SRF cavity was encountered. A maximum of 155 kW of rf power was transferred to a 120 mA beam. The window was subjected to 125 kW reflected power and processed easily. In the travelling wave mode, vacuum bursts and are trips prevented us from going above 165 kW. The maximum HOM power extracted was 2 kW. Beam stability studies were conducted for a variety of bunch configurations. In other tests a 120 mA beam was bumped horizontally and vertically by /spl plusmn/10 mm. While supporting a 100 mA beam, the cavity was axially deformed with the tuner by 0.4 mm to sweep the HOM frequencies across dangerous revolution harmonics. In all such tests, no resonant excitation of HOMs or beam instabilities were observed, which confirms that the potentially dangerous modes were damped strongly enough to be rendered harmless.

10 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.
Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

50,607 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Journal ArticleDOI
TL;DR: This paper examines eight published reviews each reporting results from several related trials in order to evaluate the efficacy of a certain treatment for a specified medical condition and suggests a simple noniterative procedure for characterizing the distribution of treatment effects in a series of studies.

33,234 citations

Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations