scispace - formally typeset
Search or ask a question
Book ChapterDOI

Inference Progress in Missing Data Analysis from Independent to Longitudinal Setup

01 Jan 2013-pp 95-116
TL;DR: The purpose of this paper is to outline perspectives in a comprehensive manner so that real progress and challenges are understood in order to develop proper inference techniques.
Abstract: In the independent setup with multivariate responses, the data become incomplete when partial responses, such as responses on some variables as opposed to all variables, are available from some individuals. The main challenge here is obtaining valid inferences such as unbiased and consistent estimates of mean parameters of all response variables by using available responses. Typically, unbalanced correlation matrices are formed and moments or likelihood analysis based on the available responses are employed for such inferences. Various imputation techniques also have been used. In the longitudinal setup, when a univariate response is repeatedly collected from an individual, these repeated responses become correlated and the responses form a multivariate distribution. In this setup, it may happen that a portion of responses are not available from some individuals under study. These non-responses may be monotonic or intermittent. Also the response may be missing following a mechanism such as missing completely at random (MCAR), missing at random (MAR), or missing non-ignorably. In a longitudinal regression setup, the covariates may also be missing, but typically they are known for all time periods. Obtaining unbiased and consistent regression estimates specially when longitudinal responses are missing following MAR or ignorable mechanism becomes a challenge. This happens because one requires to accommodate both longitudinal correlations and missing mechanism to develop a proper inference tool. Over the last three decades some progress has been made toward this mainly by taking partial care of missing mechanism in developing estimation techniques. But overall, they fall short and may still produce biased and hence inconsistent estimates. The purpose of this paper is to outline these perspectives in a comprehensive manner so that real progress and challenges are understood in order to develop proper inference techniques.
Citations
More filters
Journal ArticleDOI

3,152 citations

Book ChapterDOI
01 Jan 2013
TL;DR: In this article, the relative performance of the existing weighted (by inverse probability weights for the missing indicator) GEE (WGEE), a fully standardized GQL (FSGQL) and conditional GQL(CGQL) approaches was evaluated under a longitudinal binary model and empirically examined the relative performances of the WGEE, FSGQL and CGQL approaches.
Abstract: It is well known that in the complete longitudinal setup, the so-called working correlation-based generalized estimating equations (GEE) approach may yield less efficient regression estimates as compared to the independence assumption-based method of moments and quasi-likelihood (QL) estimates. In the incomplete longitudinal setup, there exist some studies indicating that the use of the same “working” correlation-based GEE approach may provide inconsistent regression estimates especially when the longitudinal responses are at risk of being missing at random (MAR). In this paper, we revisit this inconsistency issue under a longitudinal binary model and empirically examine the relative performance of the existing weighted (by inverse probability weights for the missing indicator) GEE (WGEE), a fully standardized GQL (FSGQL) and conditional GQL (CGQL) approaches. In the comparative study, we consider both stationary and non-stationary covariates, as well as various degrees of missingness and longitudinal correlation in the data.

1 citations

Journal ArticleDOI
01 Nov 2021
TL;DR: In this paper, the authors proposed a dynamic model for unevenly spaced longitudinal Poisson counts and demonstrate the computation of correlations among such count responses through an example with T = 4 time intervals such as 4 weeks as the duration of the longitudinal study.
Abstract: In a longitudinal setup, as opposed to equi-spaced count responses, there are situations where an individual patient may provide successive count responses at unevenly spaced time intervals. These unevenly spaced count responses are in general accompanied with covariates information collected at the response occurring time points. Here, the responses and covariates are complete as opposed to certain longitudinal data subject to non-response or missing. The regression analysis of this type of unevenly spaced longitudinal count data is not adequately discussed in the literature. In this paper we propose a dynamic model for unevenly spaced longitudinal Poisson counts and demonstrate the computation of correlations among such count responses through an example with T = 4 time intervals such as 4 weeks as the duration of the longitudinal study. Here, if an individual patient reports a problem (in terms of counts) say at time intervals 1, 3, and 4 (i.e., in first, third and fourth weeks); then 3 count responses collected at these 3 times/weeks would be unevenly spaced. Clearly, this individual had nothing to report at time point 2, i.e., in second week, and hence these 3 responses are considered to be complete. Here, we emphasize that this ‘no response’ in the second week for the individual, is, neither a missing response (or so-called non-response) nor can it be quantified as a zero count because no probability can be assigned for a non-existing event. As far as the total number of time intervals is concerned it can be large but it is usually small in a longitudinal setup. However, for accuracy of correlations, one can make each interval small leading to a large value of T. For inferences, the regression parameters are estimated by using the well known GQL (generalized quasi-likelihood) approach. For the estimation of the unevenly spaced pair-wise correlation index parameters we use a standardized method of moments. The performance of the proposed estimation approaches are examined through an intensive simulation study. The results of this paper should be useful to bio-medical practitioners either currently dealing with this type of unevenly spaced count data or planning for data collection on a similar study.
References
More filters
Book
01 Jan 1987
TL;DR: This work states that maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse and large-Sample Inference Based on Maximum Likelihood Estimates is likely to be high.
Abstract: Preface.PART I: OVERVIEW AND BASIC APPROACHES.Introduction.Missing Data in Experiments.Complete-Case and Available-Case Analysis, Including Weighting Methods.Single Imputation Methods.Estimation of Imputation Uncertainty.PART II: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA.Theory of Inference Based on the Likelihood Function.Methods Based on Factoring the Likelihood, Ignoring the Missing-Data Mechanism.Maximum Likelihood for General Patterns of Missing Data: Introduction and Theory with Ignorable Nonresponse.Large-Sample Inference Based on Maximum Likelihood Estimates.Bayes and Multiple Imputation.PART III: LIKELIHOOD-BASED APPROACHES TO THE ANALYSIS OF MISSING DATA: APPLICATIONS TO SOME COMMON MODELS.Multivariate Normal Examples, Ignoring the Missing-Data Mechanism.Models for Robust Estimation.Models for Partially Classified Contingency Tables, Ignoring the Missing-Data Mechanism.Mixed Normal and Nonnormal Data with Missing Values, Ignoring the Missing-Data Mechanism.Nonignorable Missing-Data Models.References.Author Index.Subject Index.

18,201 citations

Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations

Journal ArticleDOI
TL;DR: In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
Abstract: Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.

8,197 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a global test statistic for multivariate data with missing values, that is, whether the missing data are missing completely at random (MCAR), that is whether missingness depends on the variables in the data set.
Abstract: A common concern when faced with multivariate data with missing values is whether the missing data are missing completely at random (MCAR); that is, whether missingness depends on the variables in the data set. One way of assessing this is to compare the means of recorded values of each variable between groups defined by whether other variables in the data set are missing or not. Although informative, this procedure yields potentially many correlated statistics for testing MCAR, resulting in multiple-comparison problems. This article proposes a single global test statistic for MCAR that uses all of the available data. The asymptotic null distribution is given, and the small-sample null distribution is derived for multivariate normal data with a monotone pattern of missing data. The test reduces to a standard t test when the data are bivariate with missing data confined to a single variable. A limited simulation study of empirical sizes for the test applied to normal and nonnormal data suggests th...

6,045 citations

Journal ArticleDOI

3,152 citations