scispace - formally typeset
Search or ask a question
Author

Kosuke Imai

Other affiliations: Princeton University
Bio: Kosuke Imai is an academic researcher from Harvard University. The author has contributed to research in topics: Causal inference & Randomized experiment. The author has an hindex of 49, co-authored 143 publications receiving 21983 citations. Previous affiliations of Kosuke Imai include Princeton University.


Papers
More filters
Journal ArticleDOI
TL;DR: A unified approach is proposed that makes it possible for researchers to preprocess data with matching and then to apply the best parametric techniques they would have used anyway and this procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.
Abstract: Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it is possible to find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.

3,601 citations

Journal ArticleDOI
TL;DR: MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions.
Abstract: MatchIt implements the suggestions of Ho, Imai, King, and Stuart (2007) for improving parametric statistical models by preprocessing data with nonparametric matching methods. MatchIt implements a wide range of sophisticated matching methods, making it possible to greatly reduce the dependence of causal inferences on hard-to-justify, but commonly made, statistical modeling assumptions. The software also easily fits into existing research practices since, after preprocessing data with MatchIt , researchers can use whatever parametric model they would have used without MatchIt , but produce inferences with substantially more robustness and less sensitivity to modeling assumptions. MatchIt is an R program, and also works seamlessly with Zelig .

3,012 citations

Journal ArticleDOI
TL;DR: The mediation package implements a comprehensive suite of statistical tools for conducting causal mediation analysis in applied empirical research and implements a statistical method for dealing with multiple (causally dependent) mediators, which are often encountered in practice.
Abstract: In this paper, we describe the R package mediation for conducting causal mediation analysis in applied empirical research. In many scientific disciplines, the goal of researchers is not only estimating causal effects of a treatment but also understanding the process in which the treatment causally affects the outcome. Causal mediation analysis is frequently used to assess potential causal mechanisms. The mediation package implements a comprehensive suite of statistical tools for conducting such an analysis. The package is organized into two distinct approaches. Using the model-based approach, researchers can estimate causal mediation effects and conduct sensitivity analysis under the standard research design. Furthermore, the design-based approach provides several analysis tools that are applicable under different experimental designs. This approach requires weaker assumptions than the model-based approach. We also implement a statistical method for dealing with multiple (causally dependent) mediators, which are often encountered in practice. Finally, the package also offers a methodology for assessing causal mediation in the presence of treatment noncompliance, a common problem in randomized trials.

2,417 citations

Journal ArticleDOI
TL;DR: The approach is general because it offers the definition, identification, estimation, and sensitivity analysis of causal mediation effects without reference to any specific statistical model and can accommodate linear and nonlinear relationships, parametric and nonparametric models, continuous and discrete mediators, and various types of outcome variables.
Abstract: Traditionally in the social sciences, causal mediation analysis has been formulated, understood, and implemented within the framework of linear structural equation models. We argue and demonstrate that this is problematic for 3 reasons: the lack of a general definition of causal mediation effects independent of a particular statistical model, the inability to specify the key identification assumption, and the difficulty of extending the framework to nonlinear models. In this article, we propose an alternative approach that overcomes these limitations. Our approach is general because it offers the definition, identification, estimation, and sensitivity analysis of causal mediation effects without reference to any specific statistical model. Further, our approach explicitly links these 4 elements closely together within a single framework. As a result, the proposed framework can accommodate linear and nonlinear relationships, parametric and nonparametric models, continuous and discrete mediators, and various types of outcome variables. The general definition and identification result also allow us to develop sensitivity analysis in the context of commonly used models, which enables applied researchers to formally assess the robustness of their empirical conclusions to violations of the key assumption. We illustrate our approach by applying it to the Job Search Intervention Study. We also offer easy-to-use software that implements all our proposed methods.

2,393 citations

Journal ArticleDOI
Daniel J. Benjamin1, James O. Berger2, Magnus Johannesson3, Magnus Johannesson1, Brian A. Nosek4, Brian A. Nosek5, Eric-Jan Wagenmakers6, Richard A. Berk7, Kenneth A. Bollen8, Björn Brembs9, Lawrence D. Brown7, Colin F. Camerer10, David Cesarini11, David Cesarini12, Christopher D. Chambers13, Merlise A. Clyde2, Thomas D. Cook14, Thomas D. Cook15, Paul De Boeck16, Zoltan Dienes17, Anna Dreber3, Kenny Easwaran18, Charles Efferson19, Ernst Fehr20, Fiona Fidler21, Andy P. Field17, Malcolm R. Forster22, Edward I. George7, Richard Gonzalez23, Steven N. Goodman24, Edwin J. Green25, Donald P. Green26, Anthony G. Greenwald27, Jarrod D. Hadfield28, Larry V. Hedges15, Leonhard Held20, Teck-Hua Ho29, Herbert Hoijtink30, Daniel J. Hruschka31, Kosuke Imai32, Guido W. Imbens24, John P. A. Ioannidis24, Minjeong Jeon33, James Holland Jones34, Michael Kirchler35, David Laibson36, John A. List37, Roderick J. A. Little23, Arthur Lupia23, Edouard Machery38, Scott E. Maxwell39, Michael A. McCarthy21, Don A. Moore40, Stephen L. Morgan41, Marcus R. Munafò42, Shinichi Nakagawa43, Brendan Nyhan44, Timothy H. Parker45, Luis R. Pericchi46, Marco Perugini47, Jeffrey N. Rouder48, Judith Rousseau49, Victoria Savalei50, Felix D. Schönbrodt51, Thomas Sellke52, Betsy Sinclair53, Dustin Tingley36, Trisha Van Zandt16, Simine Vazire54, Duncan J. Watts55, Christopher Winship36, Robert L. Wolpert2, Yu Xie32, Cristobal Young24, Jonathan Zinman44, Valen E. Johnson18, Valen E. Johnson1 
University of Southern California1, Duke University2, Stockholm School of Economics3, Center for Open Science4, University of Virginia5, University of Amsterdam6, University of Pennsylvania7, University of North Carolina at Chapel Hill8, University of Regensburg9, California Institute of Technology10, Research Institute of Industrial Economics11, New York University12, Cardiff University13, Mathematica Policy Research14, Northwestern University15, Ohio State University16, University of Sussex17, Texas A&M University18, Royal Holloway, University of London19, University of Zurich20, University of Melbourne21, University of Wisconsin-Madison22, University of Michigan23, Stanford University24, Rutgers University25, Columbia University26, University of Washington27, University of Edinburgh28, National University of Singapore29, Utrecht University30, Arizona State University31, Princeton University32, University of California, Los Angeles33, Imperial College London34, University of Innsbruck35, Harvard University36, University of Chicago37, University of Pittsburgh38, University of Notre Dame39, University of California, Berkeley40, Johns Hopkins University41, University of Bristol42, University of New South Wales43, Dartmouth College44, Whitman College45, University of Puerto Rico46, University of Milan47, University of California, Irvine48, Paris Dauphine University49, University of British Columbia50, Ludwig Maximilian University of Munich51, Purdue University52, Washington University in St. Louis53, University of California, Davis54, Microsoft55
TL;DR: The default P-value threshold for statistical significance is proposed to be changed from 0.05 to 0.005 for claims of new discoveries in order to reduce uncertainty in the number of discoveries.
Abstract: We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.

1,586 citations


Cited by
More filters
Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal ArticleDOI
TL;DR: Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.
Abstract: The R package mice imputes incomplete multivariate data by chained equations. The software mice 1.0 appeared in the year 2000 as an S-PLUS library, and in 2001 as an R package. mice 1.0 introduced predictor selection, passive imputation and automatic pooling. This article documents mice, which extends the functionality of mice 1.0 in several ways. In mice, the analysis of imputed data is made completely general, whereas the range of models under which pooling works is substantially extended. mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction. Special attention is paid to transformations, sum scores, indices and interactions using passive imputation, and to the proper setup of the predictor matrix. mice can be downloaded from the Comprehensive R Archive Network. This article provides a hands-on, stepwise approach to solve applied incomplete data problems.

10,234 citations

Book
01 Jan 2009

8,216 citations

Journal ArticleDOI
TL;DR: The propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects, and different causal average treatment effects and their relationship with propensity score analyses are described.
Abstract: The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the propensity score is a balancing score: conditional on the propensity score, the distribution of observed baseline covariates will be similar between treated and untreated subjects. I describe 4 different propensity score methods: matching on the propensity score, stratification on the propensity score, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. I describe balance diagnostics for examining whether the propensity score model has been adequately specified. Furthermore, I discuss differences between regression-based methods and propensity score-based methods for the analysis of observational data. I describe different causal average treatment effects and their relationship with propensity score analyses.

7,895 citations

Journal ArticleDOI
TL;DR: A structure for thinking about matching methods and guidance on their use is provided, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.
Abstract: When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970's, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine, and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods-or developing methods related to matching-do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.

3,952 citations