Home
/
Authors
/
Joseph L. Schafer

Author

Joseph L. Schafer

Bio: Joseph L. Schafer is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Missing data & Imputation (statistics). The author has an hindex of 20, co-authored 29 publications receiving 15403 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Missing data: Our view of the state of the art.

[...]

Joseph L. Schafer, John W. Graham

01 Jun 2002-Psychological Methods

TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.

...read moreread less

Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

...read moreread less

10,568 citations

Journal Article•DOI•

A comparison of inclusive and restrictive strategies in modern missing data procedures.

[...]

Linda M. Collins, Joseph L. Schafer, Chi Ming Kam

01 Dec 2001-Psychological Methods

TL;DR: A simulation was presented to assess the potential costs and benefits of a restrictive strategy, which makes minimal use of auxiliary variables, versus an inclusive strategy,Which shows that the inclusive strategy is to be greatly preferred.

...read moreread less

Abstract: Two classes of modern missing data procedures, maximum likelihood (ML) and multiple imputation (MI), tend to yield similar results when implemented in comparable ways. In either approach, it is possible to include auxiliary variables solely for the purpose of improving the missing data procedure. A simulation was presented to assess the potential costs and benefits of a restrictive strategy, which makes minimal use of auxiliary variables, versus an inclusive strategy, which makes liberal use of such variables. The simulation showed that the inclusive strategy is to be greatly preferred. With an inclusive strategy not only is there a reduced chance of inadvertently omitting an important cause of missingness, there is also the possibility of noticeable gains in terms of increased efficiency and reduced bias, with only minor costs. As implemented in currently available software, the ML approach tends to encourage the use of a restrictive strategy, whereas the MI approach makes it relatively simple to use an inclusive strategy.

...read moreread less

2,153 citations

Journal Article•DOI•

PROC LCA: A SAS Procedure for Latent Class Analysis

[...]

Stephanie T. Lanza¹, Linda M. Collins¹, David R. Lemmon¹, Joseph L. Schafer¹•Institutions (1)

Pennsylvania State University¹

05 Dec 2007-Structural Equation Modeling

Abstract: Latent class analysis (LCA) is a statistical method used to identify a set of discrete, mutually exclusive latent classes of individuals based on their responses to a set of observed categorical variables. In multiple-group LCA, both the measurement part and structural part of the model can vary across groups, and measurement invariance across groups can be empirically tested. LCA with covariates extends the model to include predictors of class membership. In this article, we introduce PROC LCA, a new SAS procedure for conducting LCA, multiple-group LCA, and LCA with covariates. The procedure is demonstrated using data on alcohol use behavior in a national sample of high school seniors.

...read moreread less

888 citations

Journal Article•DOI•

Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

[...]

Joseph Kang¹, Joseph L. Schafer¹•Institutions (1)

Pennsylvania State University¹

18 Apr 2008-arXiv: Methodology

TL;DR: Doubly robust (DR) procedures apply both types of model simultaneously and produce a consistent estimate of the parameter if either of the two models has been correctly specified as mentioned in this paper. But it does not demonstrate that, in at least some settings, two wrong models are not better than one.

...read moreread less

Abstract: When outcomes are missing for reasons beyond an investigator's control, there are two different ways to adjust a parameter estimate for covariates that may be related both to the outcome and to missingness. One approach is to model the relationships between the covariates and the outcome and use those relationships to predict the missing values. Another is to model the probabilities of missingness given the covariates and incorporate them into a weighted or stratified estimate. Doubly robust (DR) procedures apply both types of model simultaneously and produce a consistent estimate of the parameter if either of the two models has been correctly specified. In this article, we show that DR estimates can be constructed in many ways. We compare the performance of various DR and non-DR estimates of a population mean in a simulated example where both models are incorrect but neither is grossly misspecified. Methods that use inverse-probabilities as weights, whether they are DR or not, are sensitive to misspecification of the propensity model when some estimated propensities are small. Many DR methods perform better than simple inverse-probability weighting. None of the DR methods we tried, however, improved upon the performance of simple regression-based prediction of the missing values. This study does not represent every missing-data problem that will arise in practice. But it does demonstrate that, in at least some settings, two wrong models are not better than one.

...read moreread less

529 citations

Journal Article•DOI•

Average causal effects from nonrandomized studies: A practical guide and simulated example.

[...]

Joseph L. Schafer¹, Joseph Kang²•Institutions (2)

Pennsylvania State University¹, Northwestern University²

01 Dec 2008-Psychological Methods

TL;DR: In this article, the authors review Rubin's definition of an average causal effect (ACE) as the average difference between potential outcomes under different treatments and review 9 strategies for estimating ACEs on the basis of regression, propensity scores, and doubly robust methods.

...read moreread less

Abstract: In a well-designed experiment, random assignment of participants to treatments makes causal inference straightforward. However, if participants are not randomized (as in observational study, quasi-experiment, or nonequivalent control-group designs), group comparisons may be biased by confounders that influence both the outcome and the alleged cause. Traditional analysis of covariance, which includes confounders as predictors in a regression model, often fails to eliminate this bias. In this article, the authors review Rubin's definition of an average causal effect (ACE) as the average difference between potential outcomes under different treatments. The authors distinguish an ACE and a regression coefficient. The authors review 9 strategies for estimating ACEs on the basis of regression, propensity scores, and doubly robust methods, providing formulas for standard errors not given elsewhere. To illustrate the methods, the authors simulate an observational study to assess the effects of dieting on emotional distress. Drawing repeated samples from a simulated population of adolescent girls, the authors assess each method in terms of bias, efficiency, and interval coverage. Throughout the article, the authors offer insights and practical guidance for researchers who attempt causal inference with observational data.

...read moreread less

461 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Missing data: Our view of the state of the art.

[...]

Joseph L. Schafer, John W. Graham

01 Jun 2002-Psychological Methods

...read moreread less

10,568 citations

Journal Article•DOI•

mice: Multivariate Imputation by Chained Equations in R

[...]

Stef van Buuren, Karin Groothuis-Oudshoorn

12 Dec 2011-Journal of Statistical Software

TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.

...read moreread less

Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

...read moreread less

10,234 citations

Journal Article•DOI•

An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies

[...]

Peter C. Austin¹•Institutions (1)

University of Toronto¹

09 Jun 2011-Multivariate Behavioral Research

TL;DR: The propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects, and different causal average treatment effects and their relationship with propensity score analyses are described.

...read moreread less

Abstract: The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.

...read moreread less

7,895 citations

Book•

Confirmatory Factor Analysis for Applied Research

[...]

Timothy A. Brown¹•Institutions (1)

Boston University¹

01 Jan 2006

TL;DR: In this article, the authors present a detailed, worked-through example drawn from psychology, management, and sociology studies illustrate the procedures, pitfalls, and extensions of CFA methodology.

...read moreread less

Abstract: "With its emphasis on practical and conceptual aspects, rather than mathematics or formulas, this accessible book has established itself as the go-to resource on confirmatory factor analysis (CFA). Detailed, worked-through examples drawn from psychology, management, and sociology studies illustrate the procedures, pitfalls, and extensions of CFA methodology. The text shows how to formulate, program, and interpret CFA models using popular latent variable software packages (LISREL, Mplus, EQS, SAS/CALIS); understand the similarities and differences between CFA and exploratory factor analysis (EFA); and report results from a CFA study. It is filled with useful advice and tables that outline the procedures. The companion website offers data and program syntax files for most of the research examples, as well as links to CFA-related resources. New to This Edition *Updated throughout to incorporate important developments in latent variable modeling. *Chapter on Bayesian CFA and multilevel measurement models. *Addresses new topics (with examples): exploratory structural equation modeling, bifactor analysis, measurement invariance evaluation with categorical indicators, and a new method for scaling latent variables. *Utilizes the latest versions of major latent variable software packages"--

...read moreread less

7,620 citations

Journal Article•DOI•

Multiple imputation using chained equations: Issues and guidance for practice

[...]

Ian R. White, Patrick Royston¹, Angela M. Wood²•Institutions (2)

University College London¹, University of Cambridge²

20 Feb 2011-Statistics in Medicine

TL;DR: The principles of the method and how to impute categorical and quantitative variables, including skewed variables, are described and shown and the practical analysis of multiply imputed data is described, including model building and model checking.

...read moreread less

Abstract: Multiple imputation by chained equations is a flexible and practical approach to handling missing data. We describe the principles of the method and show how to impute categorical and quantitative variables, including skewed variables. We give guidance on how to specify the imputation model and how many imputations are needed. We describe the practical analysis of multiply imputed data, including model building and model checking. We stress the limitations of the method and discuss the possible pitfalls. We illustrate the ideas using a data set in mental health, giving Stata code fragments. Copyright © 2010 John Wiley & Sons, Ltd.

...read moreread less

6,349 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse