scispace - formally typeset
Search or ask a question

Showing papers by "Donald B. Rubin published in 2014"


Book ChapterDOI
28 Aug 2014

83 citations


OtherDOI
29 Sep 2014

55 citations


Journal ArticleDOI
TL;DR: This article proposed graphical displays that help formalize and visualize the results of sensitivity analyses, building upon the idea of "tipping-point" analysis for randomized experiments with a binary outcome and a dichotomous treatment.
Abstract: Although recent guidelines for dealing with missing data emphasize the need for sensitivity analyses, and such analyses have a long history in statistics, universal recommendations for conducting and displaying these analyses are scarce. We propose graphical displays that help formalize and visualize the results of sensitivity analyses, building upon the idea of 'tipping-point' analysis for randomized experiments with a binary outcome and a dichotomous treatment. The resulting 'enhanced tipping-point displays' are convenient summaries of conclusions obtained from making different modeling assumptions about missingness mechanisms. The primary goal of the displays is to make formal sensitivity analysesmore comprehensible to practitioners, thereby helping them assess the robustness of the experiment's conclusions to plausible missingness mechanisms. We also present a recent example of these enhanced displays in amedical device clinical trial that helped lead to FDA approval.

48 citations


Book ChapterDOI
28 Aug 2014

37 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed the multiple imputation by ordered monotone blanks (MIMBL) method, where the missingness pattern is defined as a set of variables with fewer or the same number of missing values.
Abstract: Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone bl...

33 citations


Journal ArticleDOI
TL;DR: The Neyman-Fisher controversy as discussed by the authors has been widely considered to have had a deleterious impact on the development of statistics, with a major consequence being that potential outcomes were ignored in favor of linear models and classical statistical procedures that are imprecise without applied contexts.
Abstract: The Neyman-Fisher controversy considered here originated with the 1935 presentation of Jerzy Neyman's Statistical Problems in Agricultural Experimentation to the Royal Statistical Society. Neyman asserted that the standard ANOVA F-test for randomized complete block designs is valid, whereas the analogous test for Latin squares is invalid in the sense of detecting differentiation among the treatments, when none existed on average, more often than desired (i.e., having a higher Type I error than advertised). However, Neyman's expressions for the expected mean residual sum of squares, for both designs, are generally incorrect. Furthermore, Neyman's belief that the Type I error (when testing the null hypothesis of zero average treatment effects) is higher than desired, whenever the expected mean treatment sum of squares is greater than the expected mean residual sum of squares, is generally incorrect. Simple examples show that, without further assumptions on the potential outcomes, one cannot determine the Type I error of the F-test from expected sums of squares. Ultimately, we believe that the Neyman-Fisher controversy had a deleterious impact on the development of statistics, with a major consequence being that potential outcomes were ignored in favor of linear models and classical statistical procedures that are imprecise without applied contexts.

18 citations


Journal ArticleDOI
TL;DR: The Neyman-Fisher controversy as discussed by the authors has had a deleterious impact on the development of statistics, with a major consequence being that potential outcomes were ignored in favor of linear models and classical statistical procedures that are imprecise.
Abstract: The Neyman-Fisher controversy considered here originated with the 1935 presentation of Jerzy Neyman's Statistical Problems in Agricultural Experimentation to the Royal Statistical Society. Neyman asserted that the standard ANOVA F-test for randomized complete block designs is valid, whereas the analogous test for Latin squares is invalid in the sense of detecting differentiation among the treatments, when none existed on average,more often than desired (i.e., having a higher Type I error than advertised). However, Neyman's expressions for the expected mean residual sum of squares, for both designs, are generally incorrect. Furthermore, Neyman's belief that the Type I error (when testing the null hypothesis of zero average treatment effects) is higher than desired, whenever the expected mean treatment sum of squares is greater than the expected mean residual sum of squares, is generally incorrect. Simple examples show that, without further assumptions on the potential outcomes, one cannot determine the Type I error of the F-test from expected sums of squares. Ultimately, we believe that the Neyman- Fisher controversy had a deleterious impact on the development of statistics, with a major consequence being that potential outcomes were ignored in favor of linear models and classical statistical procedures that are imprecise without applied contexts.

16 citations



Book ChapterDOI
28 Aug 2014
TL;DR: The problem is to generate missing data yourself using three mechanisms (MCAR, MAR, NMAR), apply two different missing data methods (complete case analysis and variable mean imputation), and compute means, variances, and the correlation matrix for salary as a function of profits.
Abstract: Application of simple methods Complete case analysis and variable mean imputation Download from the website, and into R a dataset called ceo.dat. The dataset contains information on 6 variables of fortune 500 companies in 1999. It has 447 cases. Originally, the fortune 500 consists of 500 cases, but 53 cases were deleted due to missing values. The six variables are • salary: 1999 CEO salary plus bonuses (thousand $) • totcomp: 1999 CEO total compensation (thousand $) • tenure: number of years as CEO (is 0 if less than 6 months) • age: age of CEO in years • sales: total 1998 sales revenue of firm i (million $) • profits: 1998 profits for firm i (million $) • assets: total assets of firm i in 1998 (million $) In this problem you will generate missing data yourself using three mechanisms (MCAR, MAR, NMAR). Next, you will apply two different missing data methods (complete case analysis and variable mean imputation), and compute means, variances, and the correlation matrix. Also, we are interested in salary as a function of profits (salary= a + b·profits). Since the data set is complete, and you generate missing data yourself, you will be able to compare the results from the two missing data methods to the population values (obtained from the complete data). We will assume that missing data occurs only the variables salary, totcomp, and age. The other three variables will not have missing values. Do the following: 1. Simulate 25% nonresponse in the three variables described; you will obtain three data sets, one for each scenario. Think about how you can get MCAR, MAR, and NMAR. 2. For each incomplete data set, compute the means, variances, the correlation matrix, and the regression described above. 3. Compare the results with the complete data means, variances, the correlation matrix, and the regression. 4. Write a report (2 page max) in which you describe the findings. Also describe how you generated MCAR, MAR, and NMAR, and why your method of generating nonresponse resulted in that particular mechanism. 6. Include R programs you wrote and R commands you used on a separate page. Use the materials from chapter 2, and the handout to find the least squares estimates for the missing observations in the data set called carsmiss. Download it from the website, and into R using the command carsmiss <-read.table("carsmiss.txt",T,sep=","). The data set carsmiss has four …

13 citations


Book ChapterDOI
28 Aug 2014

9 citations


Journal ArticleDOI
TL;DR: The utility of this approach is demonstrated by applying these models to a candidate endophenotype for schizophrenia, but the same methods are applicable to other types of data characterized by zero inflation and non‐independence.
Abstract: A number of mixture modeling approaches assume both normality and independent observations. However, these two assumptions are at odds with the reality of many data sets, which are often characterized by an abundance of zero-valued or highly skewed observations as well as observations from biologically related (i.e., non-independent) subjects. We present here a finite mixture model with a zero-inflated Poisson regression component that may be applied to both types of data. This flexible approach allows the use of covariates to model both the Poisson mean and rate of zero inflation and can incorporate random effects to accommodate non-independent observations. We demonstrate the utility of this approach by applying these models to a candidate endophenotype for schizophrenia, but the same methods are applicable to other types of data characterized by zero inflation and non-independence.