scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Akaike's Information Criterion in Generalized Estimating Equations

Wei Pan1
01 Mar 2001-Biometrics (John Wiley & Sons, Ltd)-Vol. 57, Iss: 1, pp 120-125
TL;DR: This work proposes a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term.
Abstract: Correlated response data are common in biomedical studies. Regression analysis based on the generalized estimating equations (GEE) is an increasingly important method for such data. However, there seem to be few model-selection criteria available in GEE. The well-known Akaike Information Criterion (AIC) cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is nonlikelihood based. We propose a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term. Its performance is investigated through simulation studies. For illustration, the method is applied to a real data set.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The information-theoretic (I-T) approaches to valid inference are outlined including a review of some simple methods for making formal inference from all the hypotheses in the model set (multimodel inference).
Abstract: We briefly outline the information-theoretic (I-T) approaches to valid inference including a review of some simple methods for making formal inference from all the hypotheses in the model set (multimodel inference). The I-T approaches can replace the usual t tests and ANOVA tables that are so inferentially limited, but still commonly used. The I-T methods are easy to compute and understand and provide formal measures of the strength of evidence for both the null and alternative hypotheses, given the data. We give an example to highlight the importance of deriving alternative hypotheses and representing these as probability models. Fifteen technical issues are addressed to clarify various points that have appeared incorrectly in the recent literature. We offer several remarks regarding the future of empirical science and data analysis under an I-T framework.

3,105 citations

Book
01 Jan 2007
TL;DR: In this article, the authors introduce the concept of risk in count response models and assess the performance of count models, including Poisson regression, negative binomial regression, and truncated count models.
Abstract: Preface 1. Introduction 2. The concept of risk 3. Overview of count response models 4. Methods of estimation and assessment 5. Assessment of count models 6. Poisson regression 7. Overdispersion 8. Negative binomial regression 9. Negative binomial regression: modeling 10. Alternative variance parameterizations 11. Problems with zero counts 12. Censored and truncated count models 13. Handling endogeneity and latent class models 14. Count panel models 15. Bayesian negative binomial models Appendix A. Constructing and interpreting interactions Appendix B. Data sets and Stata files References Index.

2,967 citations

MonographDOI
01 Jan 2011

1,461 citations

Journal ArticleDOI
TL;DR: The generalized estimating equation (GEE) as mentioned in this paper approach of Zeger and Liang facilitates analysis of data collected in longitudinal, nested, or repeated measures designs, in part because they permit specification of a working correlation matrix that accounts for the form of within-subject correlation of responses on dependent variables of many different distributions, including normal, binomial, and Poisson.
Abstract: The generalized estimating equation (GEE) approach of Zeger and Liang facilitates analysis of data collected in longitudinal, nested, or repeated measures designs. GEEs use the generalized linear model to estimate more efficient and unbiased regression parameters relative to ordinary least squares regression in part because they permit specification of a working correlation matrix that accounts for the form of within-subject correlation of responses on dependent variables of many different distributions, including normal, binomial, and Poisson. The author briefly explains the theory behind GEEs and their beneficial statistical properties and limitations and compares GEEs to suboptimal approaches for analyzing longitudinal data through use of two examples. The first demonstration applies GEEs to the analysis of data from a longitudinal lab study with a counted response variable; the second demonstration applies GEEs to analysis of data with a normally distributed response variable from subjects nested with...

1,271 citations


Cites background or methods from "Akaike's Information Criterion in G..."

  • ...Despite the advances of Zheng (2000) and Pan (2001), goodness-of-fit statistics for GEEs that would function as the equivalent to measures such as the magnitude of the squared differences of observed versus predicted values or dispersion measures are not widely accepted for most classes of…...

    [...]

  • ...For cases in which users may be undecided between two structures, Pan (2001) proposed a test that extends Akaike’s information criterion to allow comparison of covariance matrices under GEE models to the covariance matrix generated from a model that assumes no correlation within cluster....

    [...]

  • ...“The goal of selecting a working correlation structure is to estimate β more efficiently” (Pan, 2001, p. 122), and incorrect specification of the correlation structure can affect the efficiency of the parameter estimates (Fitzmaurice, 1995)....

    [...]

  • ...The goodness-of-fit measures of Zheng (2000) (marginal R2 and the concordance correlation) and Pan (2001) presented here have the benefit of simplicity and ease of interpretation, but they have not been used extensively in biostatistics and health research literature, in which GEEs originated and…...

    [...]

  • ...Attention to issues of model selection and fitting issues for GEEs has lagged behind the attention paid to extending distributional assumptions and other refinements in the variance estimation processes (Pan, 2001)....

    [...]

Journal ArticleDOI
TL;DR: An SAS macro is developed and presented here that creates an RCS function of continuous exposures, displays graphs showing the dose‐response association with 95 per cent confidence interval between one main continuous exposure and an outcome when performing linear, logistic, or Cox models, as well as linear and logistic‐generalized estimating equations.
Abstract: Taking into account a continuous exposure in regression models by using categorization, when non-linear dose-response associations are expected, have been widely criticized. As one alternative, restricted cubic spline (RCS) functions are powerful tools (i) to characterize a dose-response association between a continuous exposure and an outcome, (ii) to visually and/or statistically check the assumption of linearity of the association, and (iii) to minimize residual confounding when adjusting for a continuous exposure. Because their implementation with SAS® software is limited, we developed and present here an SAS macro that (i) creates an RCS function of continuous exposures, (ii) displays graphs showing the dose-response association with 95 per cent confidence interval between one main continuous exposure and an outcome when performing linear, logistic, or Cox models, as well as linear and logistic-generalized estimating equations, and (iii) provides statistical tests for overall and non-linear associations. We illustrate the SAS macro using the third National Health and Nutrition Examination Survey data to investigate adjusted dose-response associations (with different models) between calcium intake and bone mineral density (linear regression), folate intake and hyperhomocysteinemia (logistic regression), and serum high-density lipoprotein cholesterol and cardiovascular mortality (Cox model).

1,185 citations


Cites methods from "Akaike's Information Criterion in G..."

  • ...The value of AIC is provided for all types of regressions proposed by the SAS macro, except for GEE models where AIC methods cannot be used directly [39]....

    [...]

References
More filters
Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations


"Akaike's Information Criterion in G..." refers background or methods in this paper

  • ...…in GEE The well known Akaike Information Criterion AIC cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is non likelihood based We propose a modi cation to AIC where the likelihood is replaced by the quasi likelihood and a proper adjustment is made for the…...

    [...]

  • ...…correlation matrix R there is no guarantee that a corresponding quasi likelihood exists unless certain conditions are satis ed McCul lagh and Nelder p Furthermore even if it exists in general it is di cult to construct How to construct a quasi likelihood with a general working correlation…...

    [...]

  • ...Correlated response data are common in biomedical studies Regression analysis based on the generalized estimating equations GEE is an increasingly important method for such data However there seems to be few model selection criteria available in GEE The well known Akaike Information Criterion AIC cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is non likelihood based We propose a modi cation to AIC where the likelihood is replaced by the quasi likelihood and a proper adjustment is made for the penalty term Its performance is investigated through simulation studies For illustration the method is applied to a real data set Key words AIC GEE GLM Model selection Quasi likelihood Introduction Correlated response data arise often from biomedical studies An example to be studied is the Wisconsin Epidemiologic Study of Diabetic Retinopathy WESDR Klein et al where a binary response variable is the presence of diabetic retinopathy in each of the two eyes from each participant in the study Since the two observations on the two eyes from the same participant tend to be correlated statistical analyses have to take a proper account of this correlation Since the publication of the seminal paper by Liang and Zeger the generalized estimating equation GEE approach has become increasingly important in handling such correlated data Model selection is an important issue in almost any practical data analysis A common problem is variable selection in regression given a large group of covariates including some higher order terms one needs to select a subset to be included in the regression model In the WESDR thirteen potential risk factors were collected and we need to determine which of these factors are to be included It is well known that in observational studies as the WESDR excluding some important risk factors i e confounders may result in misleading estimates of the e ects of other risk factors On the other hand including all covariates may lead to a too complex model with di culty in interpretation and with less precise parameter estimates There is an extensive model selection literature in statistics e g Miller and ref erences therein but mainly for the classic linear regression with independent data One powerful and widely used model selection criterion is Akaike s Information Criterion AIC Akaike AIC is based on the likelihood and asymptotic properties of the maximum likelihood estimator MLE Since no distribution is assumed in GEE there is no likelihood de ned thus AIC cannot be directly used On the other hand the issue of model selection in GEE has been largely neglected The goal of this paper is to propose an extension of AIC to GEE It involves using the quasi likelihood constructed from the estimating equations Wedderburn Since in general the GEE estimator has di erent asymptotic proper ties from those of the MLE a modi cation to the penalty term in the usual AIC is also necessary This paper is organized as follows In Section we rst brie y review the GEE and quasi likelihood then we propose a modi cation to AIC in GEE Simulation results are presented in Section to show its performance in selecting the working correlation matrix and selecting covariates in GEE Section applies the method to the WESDR data followed by a brief discussion AIC in GEE GEE Suppose we have a random sample of observations from n individuals For each individual i we have a vector of responses Yi Yi Yini and corresponding covariates Xi X i X ini where each Yij is a scalar and X ij a p vector In general the components of Yi are correlated but Yi and Yk are independent for any i k conditional on the covariates We use D f Y X Yn Xn g to denote the data at hand To model the relation between the response and covariates one can use a regression model similar to the generalized linear models g i Xi where i E YijXi g is a speci ed link function and p is a vector of unknown regression coe cients to be estimated The GEE approach estimates through solving the following estimating equations Liang and Zeger S R D nX i D iV i Yi i where Di Di i and Vi is a working covariance matrix of Yi Vi can be expressed in terms of a working correlation matrix R R Vi A i R A i where Ai is a diagonal matrix with elements V ar Yij V ij which is speci ed as a function of the mean ij The may be some unknown parameters involved in the working correlation structure which can be estimated through method of moments or another set of estimating equations An attractive point of the GEE approach is that it yields a consistent estimator of even when the working correlation matrix R is mis speci ed Liang and Zeger For instance it is often convenient to use a working independence model where R I Some other popular choices include compound symmetry CS i e exchangeable with Rij for any i j or rst order autoregressive AR with Rij ji jj where Rij denotes the i j th element of R Due to its simplicity the working independence model is attractive Many studies have shown that obtained under the independence model is relatively e cient Zeger McDonald at least when the correlation between responses is not large Another compelling reason for using the working independence model is in partly conditional modeling of means for longitudinal data Pepe and Anderson However for time varying or cluster speci c covariates Fitzmaurice showed that the resulting estimator from the independence model may be very ine cient its e ciency may be as low as compared to the estimator obtained by using the correct correlation structure Hence this poses a model selection problem in selecting the working correlation structure Of course we may also need to decide which covariates are to be included in the regression model g i Below we propose a quasi likelihood based model selection criterion that can be applied to address the above issues Quasi likelihood Now we need to brie y review the quasi likelihood For the moment suppose we only have a scalar response variable y We rst construct the quasi likelihood function for the mean parameter E y and dispersion parameter then we will write it in terms of the regression parameter Based on the model speci cation E y and V ar y V the log quasi likelihood function is McCullagh and Nelder p Q y Z y y t V t dt For instance with grouped binary data y Bin n it is often speci ed that V n then up to a constant Q y L y where L y y log n n log n is the log likelihood for the binomial distribution When the quasi likelihood Q reduces to L However is extremely useful in modeling overdispersion that commonly occurs in practice Some common examples of the quasi likelihood are given in McCullagh and Nelder p With a p covariate x and a speci ed regression model E y g x and V ar y V the quasi likelihood can be written down as a function of the regression coe cients Q y x Q g x y In the current context if the working independence model R I is used the working assumption is that the paired observations Yij Xij in D are independent Hence the quasi likelihood based on D is Q I D nX i niX j Q Yij Xij It is easy to verify that the left hand side of the GEE S I D in is equivalent to Q I D Thus the GEE can be regarded as a quasi likelihood score equation However if we use a more general working correlation matrix R there is no guarantee that a corresponding quasi likelihood exists unless certain conditions are satis ed McCul lagh and Nelder p Furthermore even if it exists in general it is di cult to construct How to construct a quasi likelihood with a general working correlation matrix is beyond the scope of this paper The main goal of this paper is to propose a criterion based on Q I D the quasi likelihood under the working independence model with an estimated using any general working correlation structure in GEE AIC and a Modi cation to AIC in GEE We rst brie y review the derivation of AIC which will motivate our modi cation to AIC A more rigorous and general discussion is available from Linhart and Zucchini For simplicity of notation we rst assume that the dispersion parameter is known hence we can ignore it in the quasi likelihood function We will discuss the situation when is unknown at the end of this section Suppose we have a candidate model M and the true model M with log likelihood function L D and L D respectively Throughout we assume that each model can be indexed by the parameter vector A well known measure of separation between two models is given by the Kullback Leibler information Kullback and Leibler also known as the cross entropy The Kullback Leibler information between M and M is EM L D where the expectation EM is taken with respect to the true distribution of D i e under modelM From a set of candidate modelsM in which each can be indexed by we would like to choose the model with the smallest However in practice since both and are unknown we have to estimate AIC was motivated as an asymptotically unbiased estimator of EM where is the MLE under any candidate model in M and the expectation is taken over the random Akaike proposed using AIC as a model selection criterion AIC L D p where p is the dimension of Model selection is accomplished by selecting from M the one that minimizes AIC Since GEE is non likelihood based we do not have a likelihood function in this context However we may have a quasi likelihood We propose to replace the likelihood L in by the quasi likelihoodQ under the working independence model and de ne a new discrepancy as I EM Q I D We assume that any quasi likelihood model inM can be indexed by the parameter vector and is the corresponding parameter for the quasi likelihood model induced by the true data generating model M For simplicity with a slight abuse of notation we suppress the dependence of I on the true model M It is well known that EM Q I D I EM Q I D A nX i D iV i Di and the latter is positive semi de nite Under suitable conditions one can exchange the order of the integration and di erentiation Then is a local minimizer of I with regard to In other words for any in a neighborhood of we have I I This implies that the discrepancy I is well de ned for all the models close to the true model Though we cannot prove is in general a global minimizer of I in the common situation that the marginal quasi likelihood Q Yij Xij is equal to the log likelihood L Yij Xij it is straightforward to verify that then is indeed a global minimizer of I due to the fact that EM L Yij Xij EM L Yij Xij for any see e g Lehmann p Now suppose the GEE estimator R is obtained using any general working correla tion structure R Following the idea of deriving Proposition of Linhart and Zucchini p which is for minimum discrepancy estimators we can approximate EM I as EM I EM Q I D EM S I D trace IJ where J Cov which can be consistently estimated by the robust or sandwich covari ance estimator say Vr Liang and Zeger I can be also consistently estimated by its empirical estimator I Q I D j Note that for R we have S R D but not necessarily S I D unless R I By ignoring the second term that is di cult to estimate we have an estimator of the right hand side of QIC R Q R I D trace I Vr This is our proposed quasi likelihood under the independence model criterion QIC for GEE Our simulation results see Section show that ignoring the second term in does not dramatically but does somewhat in uence the performance of QIC R and QIC I is the best Note that if the working independence model is used in GEE by the consistency of I and Vr and that S I D we know QIC I is an asymptotically unbiased estimator of Furthermore I and Vr are directly available from the model tting results in many statistical packages such as SAS and S Plus Hence we recommend the routine use of QIC I whenever possible QIC can be also applied to select a working correlation structure in GEE one needs to calculate the QIC for various candidate working correlation structures and then pick up the one with the smallest QIC Note that here the goal of selecting a working correlation structure is to estimate more e ciently In practice since is unknown we plug in which is estimated from the largest model available In variable selection that means we estimate based on the regression model including all covariates This is similar to estimating the dispersion parameter in linear regression with Mallow s Cp A more general but also more di cult approach is to use the extended quasi likelihood McCullagh and Nelder p which we do not pursue here Remarks When all modeling speci cations in GEE are correct I and Vr are asymptotically equiv alent and trace I Vr trace I p Then QIC reduces to AIC In GEE with correlated data one may take QICu R Q R I D p as an approximation to QIC R and thus QICu R can be potentially useful in variable selection However it is easy to see that QICu R cannot be applied to select the working correlation matrix R Our main motivation of de ning the discrepancy I using Q I D is the latter s simplicity and uniqueness However as suggested by one referee it may be possible to de ne a more general discrepancy as R EM Q R D But note that Q R D may not be unique and in general can be calculated as a path dependent line integral McCullagh and Nelder section Nevertheless according to Theorem of Hanfelt and Liang see also Li R is still a well de ned discrepancy in the sense of Simulations Simulation studies were conducted to investigate the performance of our proposed model selection criterion QIC in selecting the working correlation structure and selecting the covari ates in a marginal logistic regression model We used the same true model as in Fitzmaurice The response variable Yit is binary and its marginal mean is it with logit it x it t t and i n where the x it are iid Bernoulli x it or with probability and The true correlation matrix is CS We used a large correlation and moderate sample size n or The joint distribution of the Yi was simulated from Bahadur s representation see Fitzmaurice for more details For each sample size n or our proposed method is most likely to correctly select the CS from the three given correlation structures Table Since the distribution form of the data is known we can also compute the MLE and thus AIC For comparison we also attach the results of using AIC by assuming various correlation matrices Unsurpris ingly AIC is more e cient than is QIC probably for two reasons First the MLE of is more e cient than the GEE estimator Second information on the true correlation struc ture is embedded in the likelihood function in AIC but not directly in the quasi likelihood Q I D in QIC As mentioned earlier the strength of QIC is that it is non likelihood based whereas in practice the likelihood approach is often too restrictive with its strong distributional assumption for correlated categorical data Now we consider variable selection with an expanded full model logit it x it t x it x it t and i n where x it and are as before x it and x it are iid uniform U and inde pendent of x it and For simplicity we consider ve non nested candidate models with various subsets of covariates included The results of using QIC with di erent working correlation matrices are shown in Table The performance of the three QIC s with di erent working correlation matrices is close but QIC Ind appears to be the best This is probably related to the error introduced by ignoring the second term in for QIC CS and QIC AR For comparison we also list the results of using AIC under the correct and incorrect correlation structures Surprisingly QIC Ind turns out to be comparable with AIC CS When the distributional assumptions are violated the performance of AIC deteriorates as demonstrated by AIC Ind and AIC AR which incorrectly assume the independence and AR correlation matrices respectively We also did simulation studies to investigate the QIC s performance in selecting the working correlation matrix in modeling a partly conditional mean for longitudinal data Pepe and Anderson and in variable selection for correlated overdispersed grouped binary data The results not shown here also appeared to be promising An Example We apply the method to the WESDR Klein et al The study goal was to determine the risk factors for diabetic retinopathy The binary response is the presence of diabetic retinopathy in each of two eyes from each of individuals in the study There are potential risk factors As shown in Barnhart and Williamson a univariate analysis was conducted to investigate the marginal association between the response variable and each risk factor It was found that of them are marginally associated with the response variable Barnhart and Williamson included only four risk factors duration of diabetes years glycosylated hemoglobin level diastolic blood pressure and body mass index plus two quadratic terms of duration of diabetes and body mass index in their nal model Now we consider adding all or some of the four removed covariates i e intraocular pressure systolic blood pressure pulse rate and proteinuria into Barnhart and Williamson s model Hence we have candidate models Note that these models cannot be ordered as a nested sequence and one advantage of using a exible model selection criterion such as QIC is its ability to compare non nested models Due to the nature of the possible correlation between the two observations on the two eyes from the same participant GEE is used to t the marginal logistic regression model and QIC is applied to do model selection all under the working independence model The selected top four models along with the full model ranked ! and Barnhart and Williamson s model ranked ! are listed in Table The p values associated with GEE estimates are also presented According to the QIC values the top four models are very close but di erent from Barnhart and Williamson s model in that proteinuria is included in the former four models From Table we can see that proteinuria is an important and statistically signi cant risk factor and adding intraocular pressure or systolic blood pressure into the model may also improve its performance Discussion For likelihood based methods there are many well studied model selection criteria such as AIC But for non likelihood based methods such as GEE there is a lack of literature on model selection In this article we have proposed a new criterion QIC that works for GEE The QIC involves using the quasi likelihood constructed under the working independence model and the naive and robust covariance estimates of estimated regression coe cients Although using other more general quasi likelihood seems possible we choose to use the quasi likelihood under the working independence model due to its simplicity However QIC allows one to use any general working correlation structure to estimate the parameters in GEE In simulation studies we found that the QIC works well in variable selection and selecting the working correlation matrix We were particularly impressed with the performance of QIC I in variable selection Further applications warrant future studies...

    [...]

Proceedings Article
01 Jan 1973
TL;DR: The classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion to provide answers to many practical problems of statistical model fitting.
Abstract: In this paper it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion. This observation shows an extension of the principle to provide answers to many practical problems of statistical model fitting.

18,539 citations


"Akaike's Information Criterion in G..." refers methods in this paper

  • ...Zeger S L Liang K Y and Albert P S Models for longitudinal data a general ized estimating equation approach Biometrics Table Frequency of the set of variables selected by QIC vs AIC for the marginal logistic model from independent replications The true model has fx x g and AIC CS is calculated correctly using the CS correlation matrix n n x x x x x x Criterion x x x x x x x x x x x x x x x x x x QIC Ind QIC CS QIC AR AIC Ind AIC CS AIC AR Table QIC and robust p values for each covariate in the top four models and the other two models with the WESDR data Covariate Model Model Model Model Model Model Intraocular pressure Systolic blood pressure Pulse rate Proteinuria Duration of diabetes Glycosylated hemoglobin Diastolic blood pressure Body mass index Duration of diabetes Body mass index QIC Ind...

    [...]

  • ...One powerful and widely used model-selection criterion is Akaike’s Information Criterion (AIC) (Akaike, 1973)....

    [...]

  • ...Correlated response data are common in biomedical studies Regression analysis based on the generalized estimating equations GEE is an increasingly important method for such data However there seems to be few model selection criteria available in GEE The well known Akaike Information Criterion AIC cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is non likelihood based We propose a modi cation to AIC where the likelihood is replaced by the quasi likelihood and a proper adjustment is made for the penalty term Its performance is investigated through simulation studies For illustration the method is applied to a real data set Key words AIC GEE GLM Model selection Quasi likelihood Introduction Correlated response data arise often from biomedical studies An example to be studied is the Wisconsin Epidemiologic Study of Diabetic Retinopathy WESDR Klein et al where a binary response variable is the presence of diabetic retinopathy in each of the two eyes from each participant in the study Since the two observations on the two eyes from the same participant tend to be correlated statistical analyses have to take a proper account of this correlation Since the publication of the seminal paper by Liang and Zeger the generalized estimating equation GEE approach has become increasingly important in handling such correlated data Model selection is an important issue in almost any practical data analysis A common problem is variable selection in regression given a large group of covariates including some higher order terms one needs to select a subset to be included in the regression model In the WESDR thirteen potential risk factors were collected and we need to determine which of these factors are to be included It is well known that in observational studies as the WESDR excluding some important risk factors i e confounders may result in misleading estimates of the e ects of other risk factors On the other hand including all covariates may lead to a too complex model with di culty in interpretation and with less precise parameter estimates There is an extensive model selection literature in statistics e g Miller and ref erences therein but mainly for the classic linear regression with independent data One powerful and widely used model selection criterion is Akaike s Information Criterion AIC Akaike AIC is based on the likelihood and asymptotic properties of the maximum likelihood estimator MLE Since no distribution is assumed in GEE there is no likelihood de ned thus AIC cannot be directly used On the other hand the issue of model selection in GEE has been largely neglected The goal of this paper is to propose an extension of AIC to GEE It involves using the quasi likelihood constructed from the estimating equations Wedderburn Since in general the GEE estimator has di erent asymptotic proper ties from those of the MLE a modi cation to the penalty term in the usual AIC is also necessary This paper is organized as follows In Section we rst brie y review the GEE and quasi likelihood then we propose a modi cation to AIC in GEE Simulation results are presented in Section to show its performance in selecting the working correlation matrix and selecting covariates in GEE Section applies the method to the WESDR data followed by a brief discussion AIC in GEE GEE Suppose we have a random sample of observations from n individuals For each individual i we have a vector of responses Yi Yi Yini and corresponding covariates Xi X i X ini where each Yij is a scalar and X ij a p vector In general the components of Yi are correlated but Yi and Yk are independent for any i k conditional on the covariates We use D f Y X Yn Xn g to denote the data at hand To model the relation between the response and covariates one can use a regression model similar to the generalized linear models g i Xi where i E YijXi g is a speci ed link function and p is a vector of unknown regression coe cients to be estimated The GEE approach estimates through solving the following estimating equations Liang and Zeger S R D nX i D iV i Yi i where Di Di i and Vi is a working covariance matrix of Yi Vi can be expressed in terms of a working correlation matrix R R Vi A i R A i where Ai is a diagonal matrix with elements V ar Yij V ij which is speci ed as a function of the mean ij The may be some unknown parameters involved in the working correlation structure which can be estimated through method of moments or another set of estimating equations An attractive point of the GEE approach is that it yields a consistent estimator of even when the working correlation matrix R is mis speci ed Liang and Zeger For instance it is often convenient to use a working independence model where R I Some other popular choices include compound symmetry CS i e exchangeable with Rij for any i j or rst order autoregressive AR with Rij ji jj where Rij denotes the i j th element of R Due to its simplicity the working independence model is attractive Many studies have shown that obtained under the independence model is relatively e cient Zeger McDonald at least when the correlation between responses is not large Another compelling reason for using the working independence model is in partly conditional modeling of means for longitudinal data Pepe and Anderson However for time varying or cluster speci c covariates Fitzmaurice showed that the resulting estimator from the independence model may be very ine cient its e ciency may be as low as compared to the estimator obtained by using the correct correlation structure Hence this poses a model selection problem in selecting the working correlation structure Of course we may also need to decide which covariates are to be included in the regression model g i Below we propose a quasi likelihood based model selection criterion that can be applied to address the above issues Quasi likelihood Now we need to brie y review the quasi likelihood For the moment suppose we only have a scalar response variable y We rst construct the quasi likelihood function for the mean parameter E y and dispersion parameter then we will write it in terms of the regression parameter Based on the model speci cation E y and V ar y V the log quasi likelihood function is McCullagh and Nelder p Q y Z y y t V t dt For instance with grouped binary data y Bin n it is often speci ed that V n then up to a constant Q y L y where L y y log n n log n is the log likelihood for the binomial distribution When the quasi likelihood Q reduces to L However is extremely useful in modeling overdispersion that commonly occurs in practice Some common examples of the quasi likelihood are given in McCullagh and Nelder p With a p covariate x and a speci ed regression model E y g x and V ar y V the quasi likelihood can be written down as a function of the regression coe cients Q y x Q g x y In the current context if the working independence model R I is used the working assumption is that the paired observations Yij Xij in D are independent Hence the quasi likelihood based on D is Q I D nX i niX j Q Yij Xij It is easy to verify that the left hand side of the GEE S I D in is equivalent to Q I D Thus the GEE can be regarded as a quasi likelihood score equation However if we use a more general working correlation matrix R there is no guarantee that a corresponding quasi likelihood exists unless certain conditions are satis ed McCul lagh and Nelder p Furthermore even if it exists in general it is di cult to construct How to construct a quasi likelihood with a general working correlation matrix is beyond the scope of this paper The main goal of this paper is to propose a criterion based on Q I D the quasi likelihood under the working independence model with an estimated using any general working correlation structure in GEE AIC and a Modi cation to AIC in GEE We rst brie y review the derivation of AIC which will motivate our modi cation to AIC A more rigorous and general discussion is available from Linhart and Zucchini For simplicity of notation we rst assume that the dispersion parameter is known hence we can ignore it in the quasi likelihood function We will discuss the situation when is unknown at the end of this section Suppose we have a candidate model M and the true model M with log likelihood function L D and L D respectively Throughout we assume that each model can be indexed by the parameter vector A well known measure of separation between two models is given by the Kullback Leibler information Kullback and Leibler also known as the cross entropy The Kullback Leibler information between M and M is EM L D where the expectation EM is taken with respect to the true distribution of D i e under modelM From a set of candidate modelsM in which each can be indexed by we would like to choose the model with the smallest However in practice since both and are unknown we have to estimate AIC was motivated as an asymptotically unbiased estimator of EM where is the MLE under any candidate model in M and the expectation is taken over the random Akaike proposed using AIC as a model selection criterion AIC L D p where p is the dimension of Model selection is accomplished by selecting from M the one that minimizes AIC Since GEE is non likelihood based we do not have a likelihood function in this context However we may have a quasi likelihood We propose to replace the likelihood L in by the quasi likelihoodQ under the working independence model and de ne a new discrepancy as I EM Q I D We assume that any quasi likelihood model inM can be indexed by the parameter vector and is the corresponding parameter for the quasi likelihood model induced by the true data generating model M For simplicity with a slight abuse of notation we suppress the dependence of I on the true model M It is well known that EM Q I D I EM Q I D A nX i D iV i Di and the latter is positive semi de nite Under suitable conditions one can exchange the order of the integration and di erentiation Then is a local minimizer of I with regard to In other words for any in a neighborhood of we have I I This implies that the discrepancy I is well de ned for all the models close to the true model Though we cannot prove is in general a global minimizer of I in the common situation that the marginal quasi likelihood Q Yij Xij is equal to the log likelihood L Yij Xij it is straightforward to verify that then is indeed a global minimizer of I due to the fact that EM L Yij Xij EM L Yij Xij for any see e g Lehmann p Now suppose the GEE estimator R is obtained using any general working correla tion structure R Following the idea of deriving Proposition of Linhart and Zucchini p which is for minimum discrepancy estimators we can approximate EM I as EM I EM Q I D EM S I D trace IJ where J Cov which can be consistently estimated by the robust or sandwich covari ance estimator say Vr Liang and Zeger I can be also consistently estimated by its empirical estimator I Q I D j Note that for R we have S R D but not necessarily S I D unless R I By ignoring the second term that is di cult to estimate we have an estimator of the right hand side of QIC R Q R I D trace I Vr This is our proposed quasi likelihood under the independence model criterion QIC for GEE Our simulation results see Section show that ignoring the second term in does not dramatically but does somewhat in uence the performance of QIC R and QIC I is the best Note that if the working independence model is used in GEE by the consistency of I and Vr and that S I D we know QIC I is an asymptotically unbiased estimator of Furthermore I and Vr are directly available from the model tting results in many statistical packages such as SAS and S Plus Hence we recommend the routine use of QIC I whenever possible QIC can be also applied to select a working correlation structure in GEE one needs to calculate the QIC for various candidate working correlation structures and then pick up the one with the smallest QIC Note that here the goal of selecting a working correlation structure is to estimate more e ciently In practice since is unknown we plug in which is estimated from the largest model available In variable selection that means we estimate based on the regression model including all covariates This is similar to estimating the dispersion parameter in linear regression with Mallow s Cp A more general but also more di cult approach is to use the extended quasi likelihood McCullagh and Nelder p which we do not pursue here Remarks When all modeling speci cations in GEE are correct I and Vr are asymptotically equiv alent and trace I Vr trace I p Then QIC reduces to AIC In GEE with correlated data one may take QICu R Q R I D p as an approximation to QIC R and thus QICu R can be potentially useful in variable selection However it is easy to see that QICu R cannot be applied to select the working correlation matrix R Our main motivation of de ning the discrepancy I using Q I D is the latter s simplicity and uniqueness However as suggested by one referee it may be possible to de ne a more general discrepancy as R EM Q R D But note that Q R D may not be unique and in general can be calculated as a path dependent line integral McCullagh and Nelder section Nevertheless according to Theorem of Hanfelt and Liang see also Li R is still a well de ned discrepancy in the sense of Simulations Simulation studies were conducted to investigate the performance of our proposed model selection criterion QIC in selecting the working correlation structure and selecting the covari ates in a marginal logistic regression model We used the same true model as in Fitzmaurice The response variable Yit is binary and its marginal mean is it with logit it x it t t and i n where the x it are iid Bernoulli x it or with probability and The true correlation matrix is CS We used a large correlation and moderate sample size n or The joint distribution of the Yi was simulated from Bahadur s representation see Fitzmaurice for more details For each sample size n or our proposed method is most likely to correctly select the CS from the three given correlation structures Table Since the distribution form of the data is known we can also compute the MLE and thus AIC For comparison we also attach the results of using AIC by assuming various correlation matrices Unsurpris ingly AIC is more e cient than is QIC probably for two reasons First the MLE of is more e cient than the GEE estimator Second information on the true correlation struc ture is embedded in the likelihood function in AIC but not directly in the quasi likelihood Q I D in QIC As mentioned earlier the strength of QIC is that it is non likelihood based whereas in practice the likelihood approach is often too restrictive with its strong distributional assumption for correlated categorical data Now we consider variable selection with an expanded full model logit it x it t x it x it t and i n where x it and are as before x it and x it are iid uniform U and inde pendent of x it and For simplicity we consider ve non nested candidate models with various subsets of covariates included The results of using QIC with di erent working correlation matrices are shown in Table The performance of the three QIC s with di erent working correlation matrices is close but QIC Ind appears to be the best This is probably related to the error introduced by ignoring the second term in for QIC CS and QIC AR For comparison we also list the results of using AIC under the correct and incorrect correlation structures Surprisingly QIC Ind turns out to be comparable with AIC CS When the distributional assumptions are violated the performance of AIC deteriorates as demonstrated by AIC Ind and AIC AR which incorrectly assume the independence and AR correlation matrices respectively We also did simulation studies to investigate the QIC s performance in selecting the working correlation matrix in modeling a partly conditional mean for longitudinal data Pepe and Anderson and in variable selection for correlated overdispersed grouped binary data The results not shown here also appeared to be promising An Example We apply the method to the WESDR Klein et al The study goal was to determine the risk factors for diabetic retinopathy The binary response is the presence of diabetic retinopathy in each of two eyes from each of individuals in the study There are potential risk factors As shown in Barnhart and Williamson a univariate analysis was conducted to investigate the marginal association between the response variable and each risk factor It was found that of them are marginally associated with the response variable Barnhart and Williamson included only four risk factors duration of diabetes years glycosylated hemoglobin level diastolic blood pressure and body mass index plus two quadratic terms of duration of diabetes and body mass index in their nal model Now we consider adding all or some of the four removed covariates i e intraocular pressure systolic blood pressure pulse rate and proteinuria into Barnhart and Williamson s model Hence we have candidate models Note that these models cannot be ordered as a nested sequence and one advantage of using a exible model selection criterion such as QIC is its ability to compare non nested models Due to the nature of the possible correlation between the two observations on the two eyes from the same participant GEE is used to t the marginal logistic regression model and QIC is applied to do model selection all under the working independence model The selected top four models along with the full model ranked ! and Barnhart and Williamson s model ranked ! are listed in Table The p values associated with GEE estimates are also presented According to the QIC values the top four models are very close but di erent from Barnhart and Williamson s model in that proteinuria is included in the former four models From Table we can see that proteinuria is an important and statistically signi cant risk factor and adding intraocular pressure or systolic blood pressure into the model may also improve its performance Discussion For likelihood based methods there are many well studied model selection criteria such as AIC But for non likelihood based methods such as GEE there is a lack of literature on model selection In this article we have proposed a new criterion QIC that works for GEE The QIC involves using the quasi likelihood constructed under the working independence model and the naive and robust covariance estimates of estimated regression coe cients Although using other more general quasi likelihood seems possible we choose to use the quasi likelihood under the working independence model due to its simplicity However QIC allows one to use any general working correlation structure to estimate the parameters in GEE In simulation studies we found that the QIC works well in variable selection and selecting the working correlation matrix We were particularly impressed with the performance of QIC I in variable selection Further applications warrant future studies...

    [...]

Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations

Book ChapterDOI
01 Jan 1973
TL;DR: In this paper, it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion.
Abstract: In this paper it is shown that the classical maximum likelihood principle can be considered to be a method of asymptotic realization of an optimum estimate with respect to a very general information theoretic criterion. This observation shows an extension of the principle to provide answers to many practical problems of statistical model fitting.

15,424 citations