Showing papers by "Donald B. Rubin published in 1991"

PDF

Open Access

Journal Article•DOI•

Multiple imputation in health-care databases: an overview and some applications.

[...]

Donald B. Rubin¹, Nathaniel Schenker²•Institutions (2)

Harvard University¹, University of California, Los Angeles²

01 Apr 1991-Statistics in Medicine

TL;DR: This paper provides an overview of methods for creating and analysing multiply-imputed data sets, and illustrates the dramatic improvements possible when using multiple rather than single imputation.

...read moreread less

Abstract: Multiple imputation for non-response replaces each missing value by two or more plausible values. The values can be chosen to represent both uncertainty about the reasons for non-response and uncertainty about which values to impute assuming the reasons for non-response are known. This paper provides an overview of methods for creating and analysing multiply-imputed data sets, and illustrates the dramatic improvements possible when using multiple rather than single imputation. A major application of multiple imputation to public-use files from the 1970 census is discussed, and several exploratory studies related to health care that have used multiple imputation are described.

...read moreread less

1,273 citations

Journal Article•DOI•

Ignorability and Coarse Data

[...]

Daniel F. Heitjan, Donald B. Rubin

01 Dec 1991-Annals of Statistics

TL;DR: In this article, the authors present a general statistical model for data coarsening, which includes as special cases rounded, heaped, censored, partially categorized and missing data, and establish simple conditions under which the possible stochastic nature of the coarsing mechanism can be ignored when drawing Bayesian and likelihood inferences and thus the data can be validly treated as grouped data.

...read moreread less

Abstract: We present a general statistical model for data coarsening, which includes as special cases rounded, heaped, censored, partially categorized and missing data. Formally, with coarse data, observations are made not in the sample space of the random variable of interest, but rather in its power set. Grouping is a special case in which the degree of coarsening is known and nonstochastic. We establish simple conditions under which the possible stochastic nature of the coarsening mechanism can be ignored when drawing Bayesian and likelihood inferences and thus the data can be validly treated as grouped data. The conditions are that the data be coarsened at random, a generalization of the condition missing at random, and that the parameters of the data and the coarsening process be distinct. Applications of the general model and the ignorability condition are illustrated in a numerical example and described briefly in a variety of special cases.

...read moreread less

590 citations

Journal Article•DOI•

Using EM to Obtain Asymptotic Variance-Covariance Matrices: The SEM Algorithm

[...]

Xiao-Li Meng¹, Donald B. Rubin²•Institutions (2)

University of Chicago¹, Harvard University²

01 Dec 1991-Journal of the American Statistical Association

TL;DR: This article defines and illustrates a procedure that obtains numerically stable asymptotic variance–covariance matrices using only the code for computing the complete-data variance-covarance matrix, the code of the expectation maximization algorithm, and code for standard matrix operations.

...read moreread less

Abstract: The expectation maximization (EM) algorithm is a popular, and often remarkably simple, method for maximum likelihood estimation in incomplete-data problems. One criticism of EM in practice is that asymptotic variance–covariance matrices for parameters (e.g., standard errors) are not automatic byproducts, as they are when using some other methods, such as Newton–Raphson. In this article we define and illustrate a procedure that obtains numerically stable asymptotic variance–covariance matrices using only the code for computing the complete-data variance–covariance matrix, the code for EM itself, and code for standard matrix operations. The basic idea is to use the fact that the rate of convergence of EM is governed by the fractions of missing information to find the increased variability due to missing information to add to the complete-data variance–covariance matrix. We call this supplemented EM algorithm the SEM algorithm. Theory and particular examples reinforce the conclusion that the SEM alg...

...read moreread less

570 citations

Journal Article•DOI•

Large-Sample Significance Levels from Multiply Imputed Data Using Moment-Based Statistics and an F Reference Distribution

[...]

K. H. Li¹, Trivellore E. Raghunathan², Donald B. Rubin³•Institutions (3)

The Chinese University of Hong Kong¹, University of Washington², Harvard University³

01 Dec 1991-Journal of the American Statistical Association

TL;DR: In this article, the authors present a procedure for computing significance levels from data sets whose missing values have been multiply imputed data using moment-based statistics, m ≤ 3 repeated imputations, and an F reference distribution.

...read moreread less

Abstract: We present a procedure for computing significance levels from data sets whose missing values have been multiply imputed data. This procedure uses moment-based statistics, m ≤ 3 repeated imputations, and an F reference distribution. When m = ∞, we show first that our procedure is essentially the same as the ideal procedure in cases of practical importance and, second, that its deviations from the ideal are basically a function of the coefficient of variation of the canonical ratios of complete to observed information. For small m our procedure's performance is largely governed by this coefficient of variation and the mean of these ratios. Using simulation techniques with small m, we compare our procedure's actual and nominal large-sample significance levels and conclude that it is essentially calibrated and thus represents a definite improvement over previously available procedures. Furthermore, we compare the large-sample power of the procedure as a function of m and other factors, such as the di...

...read moreread less

297 citations

Book Chapter•DOI•

Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism.

[...]

Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Dec 1991-Biometrics

TL;DR: A fundamental conclusion is that in nonrandomized studies, sensitivity of inference to the assignment mechanism is the dominant issue, and it cannot be avoided by changing modes of inference, for instance, by changing from randomization-based to Bayesian methods.

...read moreread less

Abstract: Causal inference in an important topic and one that is now attracting serious attention of statisticians. Although there exist recent discussions concerning the general definition of causal effects and a substantial literature on specific techniques for the analysis of data in randomized and nonrandomized studies, there has been relatively little discussion of modes of statistical inference for causal effects. This presentation briefly describes and contrasts four basic modes of statistical inference for causal effects, emphasizes the common underlying causal framework with a posited assignment mechanism, and describes practical implications in the context of an example involving the effects of switching from a name-brand to a generic drug. A fundamental conclusion is that in such nonrandomized studies, sensitivity of inference to the assignment mechanism is the dominant issue, and it cannot be avoided by changing modes of inference, for instance, by changing from randomization-based to Bayesian methods. INTRODUCTION Causal Inference Causal inference is a topic that statisticians are addressing more vigorously and rigorously in recent years. This is a desirable development for statistics, as supported by Cox's (1986) comment on Holland (1986b) that “ the issues explicitly and implicitly raised by the article seem to me more important for the foundations of our subject than the discussion of the nature of probability ”.

...read moreread less

250 citations

Journal Article•

Significance levels from repeated p-values with multiply imputed data

[...]

K. H. Li, Xiao-Li Meng, Trivellore E. Raghunathan, Donald B. Rubin

01 Jan 1991-Statistica Sinica

229 citations

Journal Article•DOI•

Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression

[...]

Clifford C. Clogg¹, Donald B. Rubin², Nathaniel Schenker³, Bradley D. Schultz⁴, Lynn Weidman - Show less +1 more•Institutions (4)

Pennsylvania State University¹, Harvard University², University of California, Los Angeles³, United States Environmental Protection Agency⁴

01 Mar 1991-Journal of the American Statistical Association

TL;DR: In this article, the authors describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems, and show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project.

...read moreread less

Abstract: We describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems. This project represents the most extensive application of multiple imputation to date, and the modeling effort was considerable as well—hundreds of logistic regressions were estimated. One goal of this article is to summarize the strategies used in the project so that researchers can better understand how the new data bases were created. Another goal is to show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project. To multiply-impute 1980 census-comparable codes for industries and occupations in two 1970 census public-use samples, logistic regression models were estimated with flattening constants. For many of the regression models considered, the data were too sparse to support conventional maximum likelihood analysis, so some alternative had to be employed. These methods solve existence and ...

...read moreread less

197 citations

Journal Article•DOI•

EM and beyond

[...]

Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Jun 1991-Psychometrika

TL;DR: The basic theme of the EM algorithm, to repeatedly use complete-data methods to solve incomplete data problems, is also a theme of several more recent statistical techniques that combine simulation techniques with complete- data methods to attack problems that are difficult or impossible for EM.

...read moreread less

Abstract: The basic theme of the EM algorithm, to repeatedly use complete-data methods to solve incomplete data problems, is also a theme of several more recent statistical techniques. These techniques—multiple imputation, data augmentation, stochastic relaxation, and sampling importance resampling—combine simulation techniques with complete-data methods to attack problems that are difficult or impossible for EM.

...read moreread less

98 citations

Journal Article•DOI•

Statistical Analysis with Missing Data

[...]

Robert J. Mislevy, Roderick J. A. Little, Donald B. Rubin

22 Jan 1991

53 citations

Accelerating cavity development for the Cornell B factory CESR-B

[...]

H. Padamsee, P. Barnes, C. Chen, W. Hartung, M. Hiller, J. Kirchgessner, D. Moffat, R. Ringrose, Donald B. Rubin, Y. Samed - Show less +6 more

01 May 1991

TL;DR: In this paper, a new cell shape was proposed to reduce the impedances of the dangerous higher order modes (HOMs) propagation out of the cavity via the beam pipe, which is specially shaped.

...read moreread less

Abstract: To achieve luminosities of 30-100 times CESR, 1-2 A of current must be stored. A CESR B-factory parameter list calls for 50 MV for two rings, to be supplied by 16 cells operating at 10 MV/m gradient. With a new cell shape, the impedances of the dangerous higher order modes (HOM) are drastically reduced. All HOMs propagate out of the cavity via the beam pipe, which is specially shaped. This allows HOM power couplers to be placed completely outside the cryostat. A ferrite absorber on the beam pipe lowers all Qs to approximately 100, which is sufficient to avoid multibunch instabilities without feedback systems. A waveguide input coupler on the beam-pipe provides Qext as low as 5*10/sup 4/, with a C- slot shaped iris that has a negligible effect on the cavity loss parameter.<>

...read moreread less

30 citations

Journal Article•

Formal modes of statistical inference for causal effects

[...]

Donald B. Rubin

01 Jan 1991-Quality Engineering

TL;DR: A brief review of modes of statistical inference for causal effects can be found in this paper for a volume honoring I.J. Good's extensive and creative contributions to statistics, which is a volume dedicated to his life and work.

...read moreread less

Abstract: Causation is a topic that many statisticians address indirectly in their applied work when conducting randomized experiments or observational studies for treatments. Remarkably few statisticians, however, have addressed the topic in their own writing or theoretical research. I.J. Good is one of these few (e.g., Good, 1961, 1971, 1972, 1980a, 1980b, 1983, 1988), and consequently this brief review of modes of statistical inference for causal effects seems appropriate for a volume honoring Jack’s extensive and creative contributions to statistics.

...read moreread less

Journal Article•DOI•

Comment: Dose-Response Estimands

[...]

Donald B. Rubin¹•Institutions (1)

Harvard University¹

01 Mar 1991-Journal of the American Statistical Association

Journal Article•DOI•

Further issues in effect size estimation for one-sample multiple-choice-type data

[...]

Robert Rosenthal, Donald B. Rubin

07 Sep 1991-Psychological Bulletin

TL;DR: In this article, a model underlying the use of 1-sample effect size indicators that permit the comparison of effect sizes obtained from different multiple-choice studies by indexing all studies to the results that would have been obtained if there had been only two choices is discussed.

...read moreread less

Abstract: This article discusses model underlying the use of 1-sample effect size indicators that permit the comparison of effect sizes obtained from different multiple-choice studies by indexing all studies to the results that would have been obtained if there had been only 2 choices. The effect size indicator, the proportion index, (II), is based on a model implying that the probability of knowing a correct response will decrease as more incorrect choices are offered. For specific applications other models may be more appropriate but, for most applications they have encountered, the authors prefer and recommend their model

...read moreread less

Journal Article•DOI•

Summarizing multiple-choice tests using three informative statistics.

[...]

John B. Carlin, Donald B. Rubin

01 Sep 1991-Psychological Bulletin

TL;DR: In this article, the first three moments of the distribution of a multiple-choice test were used to measure the probability of obtaining a correct response, a guessing threshold, and a reliability relative to binomial variability.

...read moreread less

Abstract: An observed score distribution for a multiple-choice test can often be cogently summarized using 3 statistics that are easily calculated functions of the first 3 moments of the distribution. These statistics can be interpreted as providing a mean ability in terms of probability of obtaining a correct response, a guessing threshold, and a reliability relative to binomial variability

...read moreread less

Journal Article•DOI•

Sex Differences in Developmental Milestones During the First Year of Life

[...]

June Machover Reinisch, Leonard A. Rosenblum, Donald B. Rubin, M. Fini Schulsinger

12 Jun 1991-Journal of psychology & human sexuality

TL;DR: In this article, the age at which boys and girls attain developmental milestones during the first year of life and in the intervals between milestone attainments was revealed in the study of the Copenhagen Consecutive Perinatal Cohort.

...read moreread less

Abstract: Differences were revealed in the age at which boys and girls attain developmental milestones during the first year of life and in the intervals between milestone attainments. Three of ten milestones were reached significantly earlier in boys than girls while none of the milestones appeared earier in girls than boys. Of the 45 intervals between milestones, seven were longer in boys and twelve were longer in girls. Data were recorded by the mothers of 4653 infants participating in the Copenhagen Consecutive Perinatal Cohort. Full-term gestation (38-41 weeks) and survival through the first year were the criteria for inclusion in the study. Nine potential confounding variables, including SES, birth weight and complications of pregnancy, were not responsible for the behavioral sex differences. This sexually dimorphic pattern of milestone achievement is discussed as supportive of hypothesized biologically-based sex differences in the ontogeny of social responsiveness and is consistent with sex differences ident...

...read moreread less

Journal Article•DOI•

Compliance as an Explanatory Variable in Clinical Trials: Comment: Dose-Response Estimands

[...]

Donald B. Rubin

01 Mar 1991-Journal of the American Statistical Association

Book•

An application of Bayesian statistics using sampling/importance resampling for a deceptively simple problem in quality control

[...]

Trivellore E. Raghunathan, Donald B. Rubin

01 Jun 1991

High peak power RF processing studies of 3-GHz niobium cavities

[...]

J. Graber, J. Kirchgessner, Donald B. Rubin, H. Padamsee, P. Barnes, J. Sears, D. Moffat, Q.S. Shu - Show less +4 more

01 Jan 1991

TL;DR: In this paper, the effects and benefits of high peak power RF processing as a means of reducing field emission loading in 3-GHz niobium accelerator cavities are investigated, and a nine-cell cavity has been successfully tested, and, through HPP, reached E/sub acc/=15 MV/m, with Q/sub 0/=6.0*10/sup 9/

...read moreread less

Abstract: The effects and benefits of high peak power RF processing as a means of reducing field emission loading in 3-GHz niobium accelerator cavities are being investigated. The test apparatus includes 3-GHz klystron capable of delivering RF pulses of up to 200-kW peak power with pulse lengths up to 2.5 ms at a repetition rate of approximately 1 Hz. The test apparatus has variable coupling such that the input external Q varies between 10/sup 5/ and 10/sup 10/ without breaking the cavity vacuum. Low-power, continuous-wave (CW) tests before and after HPP show that HPP is effective in removing emissions which are unaffected by low-power RF processing. CW measurements show that field emission reduction is dependent on maximum field reached during HPP. HPP fields of E/sub peak/ = 70-72 MV/m have been attained. These tests showed FE elimination to E/sub peak/ = 40 MV/m, and maximum fields of E/sub peak/=50-55 MV/m. Temperature mapping is now available. A cavity which showed strong FE loading and had extensive temperature mapping is now being investigated in a scanning electron microscope. A nine-cell cavity has been successfully tested, and, through HPP, reached E/sub acc/=15 MV/m, with Q/sub 0/=6.0*10/sup 9/.<>

...read moreread less

Proceedings Article•DOI•

Test results on 3 GHz structures for a superconducting linear collider

[...]

R.W. Roth, V.G. Kurakin, G. Muller, H. Piel, J. Pouryamout, D. Reschke, Hasan Padamsee, J. Graber, J. Kirchgessner, D. Moffat, Donald B. Rubin, J. Sears, Q.S. Shu - Show less +9 more

06 May 1991

TL;DR: In this article, it was shown that the improvement on the performance of single cell Nb cavities made during the last 5 years by advanced dust-free mounting and firing techniques can be successfully transferred to multicell structures.

...read moreread less

Abstract: It was shown that the improvement on the performance of single cell Nb cavities made during the last 5 years by advanced dustfree mounting and firing techniques can be successfully transferred to multicell structures. Acceleration gradients of 22 MV/m in the best 5-cell and 16 MV/m in the first 9-cell cavity of improved cell shape have been achieved. The increase of E/sub acc/ beyond 25 MV/m needs additional investigation on single cell cavities. Nevertheless, superconducting accelerators are a promising option for building a linear collider for electrons with beam energies in excess of 300 GeV. >

...read moreread less

Book Chapter•DOI•

Novel Approaches for Attaining High Accelerating Fields in Superconducting Cavities

[...]

Q.S. Shu¹, J. Graber¹, W. Hartung¹, J. Kirchgessner¹, D. Moffat¹, R. Noer¹, Hasan Padamsee¹, Donald B. Rubin¹, J. Sears¹ - Show less +5 more•Institutions (1)

Cornell University¹

01 Jan 1991

TL;DR: In this article, a 6-cell cavity has been constructed in an effort to extend the achievements from single-cell test cavities toward the accelerating structures planned for TESLA (a TeV e−e+ linear collider).

...read moreread less

Abstract: Present-day superconducting (SC) radio-frequency (rf) cavity structures used in particle accelerators provide accelerating fields (Eacc) up to 10 MV/m. Field emission is the most serious obstacle to reaching the higher fields called for in future applications. We have used heat treatment (up to 1500°C), along with high-power processing of cavities and temperature mapping, to suppress field emission and analyze emitter properties. In 27 fired cavities, we have raised the average Eacc to 26 MV/m from the 14 MV/m obtained with chemical treatment (CT) alone; the highest Eacc reached is 30 MV/m. Non-accelerating cavities have also been made to investigate the highest rf field SC Nb can support; 145 MV/m has been reached. A 6-cell cavity has been constructed in an effort to extend our achievements from single-cell test cavities toward the accelerating structures planned for TESLA (a TeV e−e+ linear collider); preliminary measurements with CT only reached Eacc = 17 MV/m. The conceptual design of a B-factory cavity is also briefly discussed.

...read moreread less