scispace - formally typeset
Search or ask a question
Author

Lue Ping Zhao

Bio: Lue Ping Zhao is an academic researcher from Fred Hutchinson Cancer Research Center. The author has contributed to research in topics: Medicine & Missing data. The author has an hindex of 6, co-authored 6 publications receiving 3765 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a new class of semiparametric estimators, based on inverse probability weighted estimating equations, were proposed for parameter vector α 0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled.
Abstract: In applied problems it is common to specify a model for the conditional mean of a response given a set of regressors. A subset of the regressors may be missing for some study subjects either by design or happenstance. In this article we propose a new class of semiparametric estimators, based on inverse probability weighted estimating equations, that are consistent for parameter vector α0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled. We show that the asymptotic variance of the optimal estimator in our class attains the semiparametric variance bound for the model by first showing that our estimation problem is a special case of the general problem of parameter estimation in an arbitrary semiparametric model in which the data are missing at random and the probability of observing complete data is bounded away from 0, and then deriving a representation for the efficient score...

2,638 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a class of inverse probability of censoring weighted estimators for the parameters of models for the dependence of the mean of a vector of correlated response variables on the vector of explanatory variables in the presence of missing response data.
Abstract: We propose a class of inverse probability of censoring weighted estimators for the parameters of models for the dependence of the mean of a vector of correlated response variables on a vector of explanatory variables in the presence of missing response data. The proposed estimators do not require full specification of the likelihood. They can be viewed as an extension of generalized estimating equations estimators that allow for the data to be missing at random but not missing completely at random. These estimators can be used to correct for dependent censoring and nonrandom noncompliance in randomized clinical trials studying the effect of a treatment on the evolution over time of the mean of a response variable. The likelihood-based parametric G-computation algorithm estimator may also be used to attempt to correct for dependent censoring and nonrandom noncompliance. But because of possible model misspecification, the parametric G-computation algorithm estimator, in contrast with the proposed w...

1,510 citations

Journal ArticleDOI
TL;DR: In this article, the authors propose a weighted estimating equation that is almost identical to the maximum likelihood estimating equations. But the weighted estimating equations are a special case of those proposed earlier by Robins et al., their EM-type algorithm to solve them is new.
Abstract: In regression analysis, missing covariate data occurs often. A recent approach to analyzing such data is weighted estimating equations. With weighted estimating equations, the contribution to the estimating equation from a complete observation is weighted by the inverse probability of being observed. In this article we propose a weighted estimating equation that is almost identical to the maximum likelihood estimating equations. As such, we propose an EM-type algorithm to solve these weighted estimating equations. Although the weighted estimating equations are a special case of those proposed earlier by Robins et al., our EM-type algorithm to solve them is new. Similar to Robins and Ritov, we give the result that to obtain a consistent estimate of the regression parameters, either the missing-data mechanism or the distribution of the missing data given the observed data must be correctly specified. We compare the weighted estimating equations to maximum likelihood via two examples, a simulation a...

113 citations

Journal ArticleDOI
TL;DR: In this article, the authors describe how to use multiple imputation semiparametrically to obtain estimates of parameters and their standard errors when some individuals have missing data, but require the investigator to know or be able to estimate the process generating the missing data but requires no full distributional form for the data.
Abstract: In this paper, we describe how to use multiple imputation semiparametrically to obtain estimates of parameters and their standard errors when some individuals have missing data. The methods given require the investigator to know or be able to estimate the process generating the missing data but requires no full distributional form for the data. The method is especially useful for non-standard problems, such as estimating the median when data are missing.

28 citations

Journal ArticleDOI
TL;DR: This work proposes a further degrees-of-freedom approximation which is a function of the within and between imputation variance, the number of multiple imputations, and theNumber of observations in the sample.
Abstract: When using multiple imputation to form confidence intervals with missing data, Rubin and Schenker (1986) proposed using a t-distribution with approximate degrees-of-freedom which is a function of the number of multiple imputations and the within and between imputation variance. In this t-approximation, Rubin and Schenker assume there are a finite number of multiple imputations, but an infinite number of observations in the sample. We propose a further degrees-of-freedom approximation which is a function of the within and between imputation variance, the number of multiple imputations, and the number of observations in the sample. When the number of observations in the sample is small, our approximate degrees-of-freedom may be more appropriate, as seen in our simulations.

20 citations


Cited by
More filters
Book
01 Jan 2001
TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).
Abstract: The second edition of this acclaimed graduate text provides a unified treatment of two methods used in contemporary econometric research, cross section and data panel methods. By focusing on assumptions that can be given behavioral content, the book maintains an appropriate level of rigor while emphasizing intuitive thinking. The analysis covers both linear and nonlinear models, including models with dynamics and/or individual heterogeneity. In addition to general estimation frameworks (particular methods of moments and maximum likelihood), specific linear and nonlinear methods are covered in detail, including probit and logit models and their multivariate, Tobit models, models for count data, censored and missing data schemes, causal (or treatment) effects, and duration analysis. Econometric Analysis of Cross Section and Panel Data was the first graduate econometrics text to focus on microeconomic data structures, allowing assumptions to be separated into population and sampling assumptions. This second edition has been substantially updated and revised. Improvements include a broader class of models for missing data problems; more detailed treatment of cluster problems, an important topic for empirical researchers; expanded discussion of "generalized instrumental variables" (GIV) estimation; new coverage (based on the author's own recent research) of inverse probability weighting; a more complete framework for estimating treatment effects with panel data, and a firmly established link between econometric approaches to nonlinear panel data and the "generalized estimating equation" literature popular in statistics and other fields. New attention is given to explaining when particular econometric methods can be applied; the goal is not only to tell readers what does work, but why certain "obvious" procedures do not. The numerous included exercises, both theoretical and computer-based, allow the reader to extend methods covered in the text and discover new insights.

28,298 citations

Journal ArticleDOI
TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.
Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

10,568 citations

Journal ArticleDOI
TL;DR: Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.
Abstract: In recent years, multiple imputation has emerged as a convenient and flexible paradigm for analysing data with missing values. Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.

3,387 citations

Journal ArticleDOI
TL;DR: In the last two decades, much research has been done on the econometric and statistical analysis of such causal effects as discussed by the authors, which has reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization, and other areas in empirical microeconomics.
Abstract: Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades, much research has been done on the econometric and statistical analysis of such causal effects. This recent theoreti- cal literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization, and other areas of empirical microeconomics. In this review, we discuss some of the recent developments. We focus primarily on practical issues for empirical research- ers, as well as provide a historical overview of the area and give references to more technical research.

3,175 citations

Journal ArticleDOI

3,152 citations