Models for longitudinal data: a generalized estimating equation approach.

doi:10.2307/2531734

Home
/
Papers
/
Models for longitudinal data: a generalized estimating equation approach.

Journal Article•DOI•

Models for longitudinal data: a generalized estimating equation approach.

Scott L. Zeger¹, Kung-Yee Liang¹, Paul S. Albert¹•Institutions (1)

Johns Hopkins University¹

01 Dec 1988-Biometrics (Biometrics)-Vol. 44, Iss: 4, pp 1049-1060

TL;DR: This article discusses extensions of generalized linear models for the analysis of longitudinal data in which heterogeneity in regression parameters is explicitly modelled and uses a generalized estimating equation approach to fit both classes of models for discrete and continuous outcomes.

read less

Abstract: This article discusses extensions of generalized linear models for the analysis of longitudinal data. Two approaches are considered: subject-specific (SS) models in which heterogeneity in regression parameters is explicitly modelled; and population-averaged (PA) models in which the aggregate response for the population is the focus. We use a generalized estimating equation approach to fit both classes of models for discrete and continuous outcomes. When the subject-specific parameters are assumed to follow a Gaussian distribution, simple relationships between the PA and SS parameters are available. The methods are illustrated with an analysis of data on mother's smoking and children's respiratory disease.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Econometric Analysis of Cross Section and Panel Data

[...]

Jeffrey M. Wooldridge

01 Jan 2001

TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).

...read moreread less

Abstract: The second edition of this acclaimed graduate text provides a unified treatment of two methods used in contemporary econometric research, cross section and data panel methods. By focusing on assumptions that can be given behavioral content, the book maintains an appropriate level of rigor while emphasizing intuitive thinking. The analysis covers both linear and nonlinear models, including models with dynamics and/or individual heterogeneity. In addition to general estimation frameworks (particular methods of moments and maximum likelihood), specific linear and nonlinear methods are covered in detail, including probit and logit models and their multivariate, Tobit models, models for count data, censored and missing data schemes, causal (or treatment) effects, and duration analysis. Econometric Analysis of Cross Section and Panel Data was the first graduate econometrics text to focus on microeconomic data structures, allowing assumptions to be separated into population and sampling assumptions. This second edition has been substantially updated and revised. Improvements include a broader class of models for missing data problems; more detailed treatment of cluster problems, an important topic for empirical researchers; expanded discussion of "generalized instrumental variables" (GIV) estimation; new coverage (based on the author's own recent research) of inverse probability weighting; a more complete framework for estimating treatment effects with panel data, and a firmly established link between econometric approaches to nonlinear panel data and the "generalized estimating equation" literature popular in statistics and other fields. New attention is given to explaining when particular econometric methods can be applied; the goal is not only to tell readers what does work, but why certain "obvious" procedures do not. The numerous included exercises, both theoretical and computer-based, allow the reader to extend methods covered in the text and discover new insights.

...read moreread less

28,298 citations

Journal Article•DOI•

Missing data: Our view of the state of the art.

[...]

Joseph L. Schafer, John W. Graham

01 Jun 2002-Psychological Methods

TL;DR: 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI) are presented and may eventually extend the ML and MI methods that currently represent the state of the art.

...read moreread less

Abstract: Statistical procedures for missing data have vastly improved, yet misconception and unsound practice still abound. The authors frame the missing-data problem, review methods, offer advice, and raise issues that remain unresolved. They clear up common misunderstandings regarding the missing at random (MAR) concept. They summarize the evidence against older procedures and, with few exceptions, discourage their use. They present, in both technical and practical language, 2 general approaches that come highly recommended: maximum likelihood (ML) and Bayesian multiple imputation (MI). Newer developments are discussed, including some for dealing with missing data that are not MAR. Although not yet in the mainstream, these procedures may eventually extend the ML and MI methods that currently represent the state of the art.

...read moreread less

10,568 citations

Cites methods from "Models for longitudinal data: a gen..."

...Multilevel linear models can be fit with HLM ( Bryk, Raudenbush, & Congdon, 1996 ), MLWin (Multilevel Models Project, 1996), the SAS procedure PROC MIXED (Littell, Milliken, Stroup, & Wolfinger, 1996), Stata (Stata, 2001), and the lme function in S-PLUS (Insightful, 2001)....
[...]
...Their method is an extension of generalized estimating equations (GEE), a popular technique for modeling marginal or populationaveraged relationships between a response variable and predictors ( Zeger et al., 1988 )....
[...]
...If one dispenses with the full parametric model, estimation procedures with incomplete data are still possible, but they typically require the missing values to be MCAR rather than MAR (K. H. Yuan & Bentler, 2000; Zeger, Liang, & Albert, 1988 )....
[...]

Journal Article•DOI•

Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy

[...]

Robert J. Sampson¹, Stephen W. Raudenbush¹, Felton Earls¹•Institutions (1)

Michigan State University¹

15 Aug 1997-Science

TL;DR: Multilevel analyses showed that a measure of collective efficacy yields a high between-neighborhood reliability and is negatively associated with variations in violence, when individual-level characteristics, measurement error, and prior violence are controlled.

...read moreread less

Abstract: It is hypothesized that collective efficacy, defined as social cohesion among neighbors combined with their willingness to intervene on behalf of the common good, is linked to reduced violence. This hypothesis was tested on a 1995 survey of 8782 residents of 343 neighborhoods in Chicago, Illinois. Multilevel analyses showed that a measure of collective efficacy yields a high between-neighborhood reliability and is negatively associated with variations in violence, when individual-level characteristics, measurement error, and prior violence are controlled. Associations of concentrated disadvantage and residential instability with violence are largely mediated by collective efficacy.

...read moreread less

10,498 citations

Journal Article•DOI•

Statistical Analysis With Missing Data

[...]

Nicole A. Lazar¹•Institutions (1)

Carnegie Mellon University¹

01 Nov 2003-Technometrics

TL;DR: Generalized Estimating Equations is a good introductory book for analyzing continuous and discrete correlated data using GEE methods and provides good guidance for analyzing correlated data in biomedical studies and survey studies.

...read moreread less

Abstract: (2003). Statistical Analysis With Missing Data. Technometrics: Vol. 45, No. 4, pp. 364-365.

...read moreread less

6,960 citations

Book•

Multilevel Statistical Models

[...]

Harvey Goldstein¹•Institutions (1)

University of Bristol¹

01 Jan 1987

TL;DR: In this article, the authors present a general classification notation for multilevel models and a discussion of the general structure and maximum likelihood estimation for a multi-level model, as well as the adequacy of Ordinary Least Squares estimates.

...read moreread less

Abstract: Contents Dedication Preface Acknowledgements Notation A general classification notation and diagram Glossary Chapter 1 An introduction to multilevel models 1.1 Hierarchically structured data 1.2 School effectiveness 1.3 Sample survey methods 1.4 Repeated measures data 1.5 Event history and survival models 1.6 Discrete response data 1.7 Multivariate models 1.8 Nonlinear models 1.9 Measurement errors 1.10 Cross classifications and multiple membership structures. 1.11 Factor analysis and structural equation models 1.12 Levels of aggregation and ecological fallacies 1.13 Causality 1.14 The latent normal transformation and missing data 1.15 Other texts 1.16 A caveat Chapter 2 The 2-level model 2.1 Introduction 2.2 The 2-level model 2.3 Parameter estimation 2.4 Maximum likelihood estimation using Iterative Generalised Least Squares (IGLS) 2.5 Marginal models and Generalized Estimating Equations (GEE) 2.6 Residuals 2.7 The adequacy of Ordinary Least Squares estimates. 2.8 A 2-level example using longitudinal educational achievement data 2.9 General model diagnostics 2.10 Higher level explanatory variables and compositional effects 2.11 Transforming to normality 2.12 Hypothesis testing and confidence intervals 2.13 Bayesian estimation using Markov Chain Monte Carlo (MCMC) 2.14 Data augmentation Appendix 2.1 The general structure and maximum likelihood estimation for a multilevel model Appendix 2.2 Multilevel residuals estimation Appendix 2.3 Estimation using profile and extended likelihood Appendix 2.4 The EM algorithm Appendix 2.5 MCMC sampling Chapter 3. Three level models and more complex hierarchical structures. 3.1 Complex variance structures 3.2 A 3-level complex variation model example. 3.3 Parameter Constraints 3.4 Weighting units 3.5 Robust (Sandwich) Estimators and Jacknifing 3.6 The bootstrap 3.7 Aggregate level analyses 3.8 Meta analysis 3.9 Design issues Chapter 4. Multilevel Models for discrete response data 4.1 Generalised linear models 4.2 Proportions as responses 4.3 Examples 4.4 Models for multiple response categories 4.5 Models for counts 4.6 Mixed discrete - continuous response models 4.7 A latent normal model for binary responses 4.8 Partitioning variation in discrete response models Appendix 4.1. Generalised linear model estimation Appendix 4.2 Maximum likelihood estimation for generalised linear models Appendix 4.3 MCMC estimation for generalised linear models Appendix 4.4. Bootstrap estimation for generalised linear models Chapter 5. Models for repeated measures data 5.1 Repeated measures data 5.2 A 2-level repeated measures model 5.3 A polynomial model example for adolescent growth and the prediction of adult height 5.4 Modelling an autocorrelation structure at level 1. 5.5 A growth model with autocorrelated residuals 5.6 Multivariate repeated measures models 5.7 Scaling across time 5.8 Cross-over designs 5.9 Missing data 5.10 Longitudinal discrete response data Chapter 6. Multivariate multilevel data 6.1 Introduction 6.2 The basic 2-level multivariate model 6.3 Rotation Designs 6.4 A rotation design example using Science test scores 6.5 Informative response selection: subject choice in examinations 6.6 Multivariate structures at higher levels and future predictions 6.7 Multivariate responses at several levels 6.8 Principal Components analysis Appendix 6.1 MCMC algorithm for a multivariate normal response model with constraints Chapter 7. Latent normal models for multivariate data 7.1 The normal multilevel multivariate model 7.2 Sampling binary responses 7.3 Sampling ordered categorical responses 7.4 Sampling unordered categorical responses 7.5 Sampling count data 7.6 Sampling continuous non-normal data 7.7 Sampling the level 1 and level 2 covariance matrices 7.8 Model fit 7.9 Partially ordered data 7.10 Hybrid normal/ordered variables 7.11 Discussion Chapter 8. Multilevel factor analysis, structural equation and mixture models 8.1 A 2-stage 2-level factor model 8.2 A general multilevel factor model 8.3 MCMC estimation for the factor model 8.4 Structural equation models 8.5 Discrete response multilevel structural equation models 8.6 More complex hierarchical latent variable models 8.7 Multilevel mixture models Chapter 9. Nonlinear multilevel models 9.1 Introduction 9.2 Nonlinear functions of linear components 9.3 Estimating population means 9.4 Nonlinear functions for variances and covariances 9.5 Examples of nonlinear growth and nonlinear level 1 variance Appendix 9.1 Nonlinear model estimation Chapter 10. Multilevel modelling in sample surveys 10.1 Sample survey structures 10.2 Population structures 10.3 Small area estimation Chapter 11 Multilevel event history and survival models 11.1 Introduction 11.2 Censoring 11.3 Hazard and survival funtions 11.4 Parametric proportional hazard models 11.5 The semiparametric Cox model 11.6 Tied observations 11.7 Repeated events proportional hazard models 11.8 Example using birth interval data 11.9 Log duration models 11.10 Examples with birth interval data and children s activity episodes 11.11 The grouped discrete time hazards model 11.12 Discrete time latent normal event history models Chapter 12. Cross classified data structures 12.1 Random cross classifications 12.2 A basic cross classified model 12.3 Examination results for a cross classification of schools 12.4 Interactions in cross classifications 12.5 Cross classifications with one unit per cell 12.6 Multivariate cross classified models 12.7 A general notation for cross classifications 12.8 MCMC estimation in cross classified models Appendix 12.1 IGLS Estimation for cross classified data. Chapter 13 Multiple membership models 13.1 Multiple membership structures 13.2 Notation and classifications for multiple membership structures 13.3 An example of salmonella infection 13.4 A repeated measures multiple membership model 13.5 Individuals as higher level units 13.5.1 Example of research grant awards 13.6 Spatial models 13.7 Missing identification models Appendix 13.1 MCMC estimation for multiple membership models. Chapter 14 Measurement errors in multilevel models 14.1 A basic measurement error model 14.2 Moment based estimators 14.3 A 2-level example with measurement error at both levels. 14.4 Multivariate responses 14.5 Nonlinear models 14.6 Measurement errors for discrete explanatory variables 14.7 MCMC estimation for measurement error models Appendix 14.1 Measurement error estimation 14.2 MCMC estimation for measurement error models Chapter 15. Smoothing models for multilevel data. 15.1 Introduction 15.2. Smoothing estimators 15.3 Smoothing splines 15.4 Semi parametric smoothing models 15.5 Multilevel smoothing models 15.6 General multilevel semi-parametric smoothing models 15.7 Generalised linear models 15.8 An example Fixed Random 15.9 Conclusions Chapter 16. Missing data, partially observed data and multiple imputation 16.1 Creating a completed data set 16.2 Joint modelling for missing data 16.3 A two level model with responses of different types at both levels. 16.4 Multiple imputation 16.5 A simulation example of multiple imputation for missing data 16.6 Longitudinal data with attrition 16.7 Partially known data values 16.8 Conclusions Chapter 17 Multilevel models with correlated random effects 17.1 Non-independence of level 2 residuals 17.2 MCMC estimation for non-independent level 2 residuals 17.3 Adaptive proposal distributions in MCMC estimation 17.4 MCMC estimation for non-independent level 1 residuals 17.5 Modelling the level 1 variance as a function of explanatory variables with random effects 17.6 Discrete responses with correlated random effects 17.7 Calculating the DIC statistic 17.8 A growth data set 17.9 Conclusions Chapter 18. Software for multilevel modelling References Author index Subject index

...read moreread less

5,839 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Generalized Linear Models

[...]

Peter McCullagh¹, John A. Nelder•Institutions (1)

Imperial College London¹

01 Jan 1983

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

23,215 citations

Journal Article•DOI•

Longitudinal data analysis using generalized linear models

[...]

Kung Yee Liang¹, Scott L. Zeger¹•Institutions (1)

Johns Hopkins University¹

01 Apr 1986-Biometrika

TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.

...read moreread less

Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

...read moreread less

17,111 citations

"Models for longitudinal data: a gen..." refers background or methods in this paper

...A generalized estimating equations approach (Liang and Zeger, 1986) useful for fitting both SS and PA models is then discussed in Section 3....
[...]
...We then solve the GEE (3.1) as discussed in Liang and Zeger (1986)....
[...]
...However, only an approximation for Vi is necessary to obtain consistent and nearly efficient inferences for B using the GEE approach when the number of subjects, K, is large relative to the number of observations per subject, n,, and F is given (Liang and Zeger, 1986)....
[...]
...Liang and Zeger (1986), Zeger and Liang (1986), Stram, Wei, and Ware (1988), and Moulton (unpublished Ph.D. dissertation, The Johns Hopkins University, 1986) have previously discussed examples of PA models....
[...]

Journal Article•DOI•

Random-effects models for longitudinal data

[...]

Nan M. Laird, James H. Ware

01 Dec 1982-Biometrics

TL;DR: In this article, a unified approach to fitting two-stage random-effects models, based on a combination of empirical Bayes and maximum likelihood estimation of model parameters and using the EM algorithm, is discussed.

...read moreread less

Abstract: Models for the analysis of longitudinal data must recognize the relationship between serial observations on the same unit. Multivariate models with general covariance structure are often difficult to apply to highly unbalanced data, whereas two-stage random-effects models can be used easily. In two-stage models, the probability distributions for the response vectors of different individuals belong to a single family, but some random-effects parameters vary across individuals, with a distribution specified at the second stage. A general family of models is discussed, which includes both growth models and repeated-measures models as special cases. A unified approach to fitting these models, based on a combination of empirical Bayes and maximum likelihood estimation of model parameters and using the EM algorithm, is discussed. Two examples are taken from a current epidemiological study of the health effects of air pollution.

...read moreread less

8,410 citations

Journal Article•DOI•

Longitudinal data analysis for discrete and continuous outcomes.

[...]

Scott L. Zeger, Kung-Yee Liang

01 Mar 1986-Biometrics

TL;DR: A class of generalized estimating equations (GEEs) for the regression parameters is proposed, extensions of those used in quasi-likelihood methods which have solutions which are consistent and asymptotically Gaussian even when the time dependence is misspecified as the authors often expect.

...read moreread less

Abstract: Longitudinal data sets are comprised of repeated observations of an outcome and a set of covariates for each of many subjects. One objective of statistical analysis is to describe the marginal expectation of the outcome variable as a function of the covariates while accounting for the correlation among the repeated observations for a given subject. This paper proposes a unifying approach to such analysis for a variety of discrete and continuous outcomes. A class of generalized estimating equations (GEEs) for the regression parameters is proposed. The equations are extensions of those used in quasi-likelihood (Wedderburn, 1974, Biometrika 61, 439-447) methods. The GEEs have solutions which are consistent and asymptotically Gaussian even when the time dependence is misspecified as we often expect. A consistent variance estimate is presented. We illustrate the use of the GEE approach with longitudinal data from a study of the effect of mothers' stress on children's morbidity.

...read moreread less

7,080 citations

"Models for longitudinal data: a gen..." refers background in this paper

...Liang and Zeger (1986), Zeger and Liang (1986), Stram, Wei, and Ware (1988), and Moulton (unpublished Ph.D. dissertation, The Johns Hopkins University, 1986) have previously discussed examples of PA models....
[...]

Journal Article•DOI•

Maximum likelihood estimation of misspecified models

[...]

Halbert White

01 Jan 1982-Econometrica

TL;DR: In this article, the consequences and detection of model misspecification when using maximum likelihood techniques for estimation and inference are examined, and the properties of the quasi-maximum likelihood estimator and the information matrix are exploited to yield several useful tests.

...read moreread less

Abstract: This paper examines the consequences and detection of model misspecification when using maximum likelihood techniques for estimation and inference. The quasi-maximum likelihood estimator (QMLE) converges to a well defined limit, and may or may not be consistent for particular parameters of interest. Standard tests (Wald, Lagrange Multiplier, or Likelihood Ratio) are invalid in the presence of misspecification, but more general statistics are given which allow inferences to be drawn robustly. The properties of the QMLE and the information matrix are exploited to yield several useful tests for model misspecification.

...read moreread less

4,867 citations