scispace - formally typeset
Search or ask a question

Showing papers by "Donald B. Rubin published in 1991"


Journal ArticleDOI
TL;DR: This paper provides an overview of methods for creating and analysing multiply-imputed data sets, and illustrates the dramatic improvements possible when using multiple rather than single imputation.
Abstract: Multiple imputation for non-response replaces each missing value by two or more plausible values. The values can be chosen to represent both uncertainty about the reasons for non-response and uncertainty about which values to impute assuming the reasons for non-response are known. This paper provides an overview of methods for creating and analysing multiply-imputed data sets, and illustrates the dramatic improvements possible when using multiple rather than single imputation. A major application of multiple imputation to public-use files from the 1970 census is discussed, and several exploratory studies related to health care that have used multiple imputation are described.

1,273 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a general statistical model for data coarsening, which includes as special cases rounded, heaped, censored, partially categorized and missing data, and establish simple conditions under which the possible stochastic nature of the coarsing mechanism can be ignored when drawing Bayesian and likelihood inferences and thus the data can be validly treated as grouped data.
Abstract: We present a general statistical model for data coarsening, which includes as special cases rounded, heaped, censored, partially categorized and missing data. Formally, with coarse data, observations are made not in the sample space of the random variable of interest, but rather in its power set. Grouping is a special case in which the degree of coarsening is known and nonstochastic. We establish simple conditions under which the possible stochastic nature of the coarsening mechanism can be ignored when drawing Bayesian and likelihood inferences and thus the data can be validly treated as grouped data. The conditions are that the data be coarsened at random, a generalization of the condition missing at random, and that the parameters of the data and the coarsening process be distinct. Applications of the general model and the ignorability condition are illustrated in a numerical example and described briefly in a variety of special cases.

590 citations


Journal ArticleDOI
TL;DR: This article defines and illustrates a procedure that obtains numerically stable asymptotic variance–covariance matrices using only the code for computing the complete-data variance-covarance matrix, the code of the expectation maximization algorithm, and code for standard matrix operations.
Abstract: The expectation maximization (EM) algorithm is a popular, and often remarkably simple, method for maximum likelihood estimation in incomplete-data problems. One criticism of EM in practice is that asymptotic variance–covariance matrices for parameters (e.g., standard errors) are not automatic byproducts, as they are when using some other methods, such as Newton–Raphson. In this article we define and illustrate a procedure that obtains numerically stable asymptotic variance–covariance matrices using only the code for computing the complete-data variance–covariance matrix, the code for EM itself, and code for standard matrix operations. The basic idea is to use the fact that the rate of convergence of EM is governed by the fractions of missing information to find the increased variability due to missing information to add to the complete-data variance–covariance matrix. We call this supplemented EM algorithm the SEM algorithm. Theory and particular examples reinforce the conclusion that the SEM alg...

570 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a procedure for computing significance levels from data sets whose missing values have been multiply imputed data using moment-based statistics, m ≤ 3 repeated imputations, and an F reference distribution.
Abstract: We present a procedure for computing significance levels from data sets whose missing values have been multiply imputed data. This procedure uses moment-based statistics, m ≤ 3 repeated imputations, and an F reference distribution. When m = ∞, we show first that our procedure is essentially the same as the ideal procedure in cases of practical importance and, second, that its deviations from the ideal are basically a function of the coefficient of variation of the canonical ratios of complete to observed information. For small m our procedure's performance is largely governed by this coefficient of variation and the mean of these ratios. Using simulation techniques with small m, we compare our procedure's actual and nominal large-sample significance levels and conclude that it is essentially calibrated and thus represents a definite improvement over previously available procedures. Furthermore, we compare the large-sample power of the procedure as a function of m and other factors, such as the di...

297 citations


Book ChapterDOI
TL;DR: A fundamental conclusion is that in nonrandomized studies, sensitivity of inference to the assignment mechanism is the dominant issue, and it cannot be avoided by changing modes of inference, for instance, by changing from randomization-based to Bayesian methods.
Abstract: Causal inference in an important topic and one that is now attracting serious attention of statisticians. Although there exist recent discussions concerning the general definition of causal effects and a substantial literature on specific techniques for the analysis of data in randomized and nonrandomized studies, there has been relatively little discussion of modes of statistical inference for causal effects. This presentation briefly describes and contrasts four basic modes of statistical inference for causal effects, emphasizes the common underlying causal framework with a posited assignment mechanism, and describes practical implications in the context of an example involving the effects of switching from a name-brand to a generic drug. A fundamental conclusion is that in such nonrandomized studies, sensitivity of inference to the assignment mechanism is the dominant issue, and it cannot be avoided by changing modes of inference, for instance, by changing from randomization-based to Bayesian methods. INTRODUCTION Causal Inference Causal inference is a topic that statisticians are addressing more vigorously and rigorously in recent years. This is a desirable development for statistics, as supported by Cox's (1986) comment on Holland (1986b) that “ the issues explicitly and implicitly raised by the article seem to me more important for the foundations of our subject than the discussion of the nature of probability ”.

250 citations



Journal ArticleDOI
TL;DR: In this article, the authors describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems, and show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project.
Abstract: We describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems. This project represents the most extensive application of multiple imputation to date, and the modeling effort was considerable as well—hundreds of logistic regressions were estimated. One goal of this article is to summarize the strategies used in the project so that researchers can better understand how the new data bases were created. Another goal is to show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project. To multiply-impute 1980 census-comparable codes for industries and occupations in two 1970 census public-use samples, logistic regression models were estimated with flattening constants. For many of the regression models considered, the data were too sparse to support conventional maximum likelihood analysis, so some alternative had to be employed. These methods solve existence and ...

197 citations


Journal ArticleDOI
TL;DR: The basic theme of the EM algorithm, to repeatedly use complete-data methods to solve incomplete data problems, is also a theme of several more recent statistical techniques that combine simulation techniques with complete- data methods to attack problems that are difficult or impossible for EM.
Abstract: The basic theme of the EM algorithm, to repeatedly use complete-data methods to solve incomplete data problems, is also a theme of several more recent statistical techniques. These techniques—multiple imputation, data augmentation, stochastic relaxation, and sampling importance resampling—combine simulation techniques with complete-data methods to attack problems that are difficult or impossible for EM.

98 citations



01 May 1991
TL;DR: In this paper, a new cell shape was proposed to reduce the impedances of the dangerous higher order modes (HOMs) propagation out of the cavity via the beam pipe, which is specially shaped.
Abstract: To achieve luminosities of 30-100 times CESR, 1-2 A of current must be stored. A CESR B-factory parameter list calls for 50 MV for two rings, to be supplied by 16 cells operating at 10 MV/m gradient. With a new cell shape, the impedances of the dangerous higher order modes (HOM) are drastically reduced. All HOMs propagate out of the cavity via the beam pipe, which is specially shaped. This allows HOM power couplers to be placed completely outside the cryostat. A ferrite absorber on the beam pipe lowers all Qs to approximately 100, which is sufficient to avoid multibunch instabilities without feedback systems. A waveguide input coupler on the beam-pipe provides Qext as low as 5*10/sup 4/, with a C- slot shaped iris that has a negligible effect on the cavity loss parameter.<>

30 citations



Journal Article
TL;DR: A brief review of modes of statistical inference for causal effects can be found in this paper for a volume honoring I.J. Good's extensive and creative contributions to statistics, which is a volume dedicated to his life and work.
Abstract: Causation is a topic that many statisticians address indirectly in their applied work when conducting randomized experiments or observational studies for treatments. Remarkably few statisticians, however, have addressed the topic in their own writing or theoretical research. I.J. Good is one of these few (e.g., Good, 1961, 1971, 1972, 1980a, 1980b, 1983, 1988), and consequently this brief review of modes of statistical inference for causal effects seems appropriate for a volume honoring Jack’s extensive and creative contributions to statistics.


Journal ArticleDOI
TL;DR: In this article, a model underlying the use of 1-sample effect size indicators that permit the comparison of effect sizes obtained from different multiple-choice studies by indexing all studies to the results that would have been obtained if there had been only two choices is discussed.
Abstract: This article discusses model underlying the use of 1-sample effect size indicators that permit the comparison of effect sizes obtained from different multiple-choice studies by indexing all studies to the results that would have been obtained if there had been only 2 choices. The effect size indicator, the proportion index, (II), is based on a model implying that the probability of knowing a correct response will decrease as more incorrect choices are offered. For specific applications other models may be more appropriate but, for most applications they have encountered, the authors prefer and recommend their model

Journal ArticleDOI
TL;DR: In this article, the first three moments of the distribution of a multiple-choice test were used to measure the probability of obtaining a correct response, a guessing threshold, and a reliability relative to binomial variability.
Abstract: An observed score distribution for a multiple-choice test can often be cogently summarized using 3 statistics that are easily calculated functions of the first 3 moments of the distribution. These statistics can be interpreted as providing a mean ability in terms of probability of obtaining a correct response, a guessing threshold, and a reliability relative to binomial variability

Journal ArticleDOI
TL;DR: In this article, the age at which boys and girls attain developmental milestones during the first year of life and in the intervals between milestone attainments was revealed in the study of the Copenhagen Consecutive Perinatal Cohort.
Abstract: Differences were revealed in the age at which boys and girls attain developmental milestones during the first year of life and in the intervals between milestone attainments. Three of ten milestones were reached significantly earlier in boys than girls while none of the milestones appeared earier in girls than boys. Of the 45 intervals between milestones, seven were longer in boys and twelve were longer in girls. Data were recorded by the mothers of 4653 infants participating in the Copenhagen Consecutive Perinatal Cohort. Full-term gestation (38-41 weeks) and survival through the first year were the criteria for inclusion in the study. Nine potential confounding variables, including SES, birth weight and complications of pregnancy, were not responsible for the behavioral sex differences. This sexually dimorphic pattern of milestone achievement is discussed as supportive of hypothesized biologically-based sex differences in the ontogeny of social responsiveness and is consistent with sex differences ident...



01 Jan 1991
TL;DR: In this paper, the effects and benefits of high peak power RF processing as a means of reducing field emission loading in 3-GHz niobium accelerator cavities are investigated, and a nine-cell cavity has been successfully tested, and, through HPP, reached E/sub acc/=15 MV/m, with Q/sub 0/=6.0*10/sup 9/
Abstract: The effects and benefits of high peak power RF processing as a means of reducing field emission loading in 3-GHz niobium accelerator cavities are being investigated. The test apparatus includes 3-GHz klystron capable of delivering RF pulses of up to 200-kW peak power with pulse lengths up to 2.5 ms at a repetition rate of approximately 1 Hz. The test apparatus has variable coupling such that the input external Q varies between 10/sup 5/ and 10/sup 10/ without breaking the cavity vacuum. Low-power, continuous-wave (CW) tests before and after HPP show that HPP is effective in removing emissions which are unaffected by low-power RF processing. CW measurements show that field emission reduction is dependent on maximum field reached during HPP. HPP fields of E/sub peak/ = 70-72 MV/m have been attained. These tests showed FE elimination to E/sub peak/ = 40 MV/m, and maximum fields of E/sub peak/=50-55 MV/m. Temperature mapping is now available. A cavity which showed strong FE loading and had extensive temperature mapping is now being investigated in a scanning electron microscope. A nine-cell cavity has been successfully tested, and, through HPP, reached E/sub acc/=15 MV/m, with Q/sub 0/=6.0*10/sup 9/.<>

Proceedings ArticleDOI
06 May 1991
TL;DR: In this article, it was shown that the improvement on the performance of single cell Nb cavities made during the last 5 years by advanced dust-free mounting and firing techniques can be successfully transferred to multicell structures.
Abstract: It was shown that the improvement on the performance of single cell Nb cavities made during the last 5 years by advanced dustfree mounting and firing techniques can be successfully transferred to multicell structures. Acceleration gradients of 22 MV/m in the best 5-cell and 16 MV/m in the first 9-cell cavity of improved cell shape have been achieved. The increase of E/sub acc/ beyond 25 MV/m needs additional investigation on single cell cavities. Nevertheless, superconducting accelerators are a promising option for building a linear collider for electrons with beam energies in excess of 300 GeV. >

Book ChapterDOI
01 Jan 1991
TL;DR: In this article, a 6-cell cavity has been constructed in an effort to extend the achievements from single-cell test cavities toward the accelerating structures planned for TESLA (a TeV e−e+ linear collider).
Abstract: Present-day superconducting (SC) radio-frequency (rf) cavity structures used in particle accelerators provide accelerating fields (Eacc) up to 10 MV/m. Field emission is the most serious obstacle to reaching the higher fields called for in future applications. We have used heat treatment (up to 1500°C), along with high-power processing of cavities and temperature mapping, to suppress field emission and analyze emitter properties. In 27 fired cavities, we have raised the average Eacc to 26 MV/m from the 14 MV/m obtained with chemical treatment (CT) alone; the highest Eacc reached is 30 MV/m. Non-accelerating cavities have also been made to investigate the highest rf field SC Nb can support; 145 MV/m has been reached. A 6-cell cavity has been constructed in an effort to extend our achievements from single-cell test cavities toward the accelerating structures planned for TESLA (a TeV e−e+ linear collider); preliminary measurements with CT only reached Eacc = 17 MV/m. The conceptual design of a B-factory cavity is also briefly discussed.