scispace - formally typeset
Open AccessJournal ArticleDOI

A comparison of some methods to analyze repeated measures ordinal categorical data

Yaobing Sui, +1 more
- Vol. 13, pp 98-115
TLDR
This paper focuses on methods for ordinal categorical data with repeated measures that can be implemented using SAS, and compares the strengths and weaknesses of these different methods.
Abstract: 
Recent advances in statistical software made possible by the rapid development of computer technology in the past decade have made many new procedures available to data analysts. We focus in this paper on methods for ordinal categorical data with repeated measures that can be implemented using SAS. These procedures are illustrated using data from an animal health experiment. The responses, measured as severity of symptoms on an ordinal scale, are recorded for test animals over time. The experiment was designed to estimate treatment and time effects on the severity of symptoms. The data were analyzed with various approaches using PROC MIXED, PROC NLMIXED, PROC GENMOD, and the GLIMMIX macro. In this paper, we compare the strengths and weaknesses of these different methods.

read more

Content maybe subject to copyright    Report

Kansas State University Libraries Kansas State University Libraries
New Prairie Press New Prairie Press
Conference on Applied Statistics in Agriculture 2001 - 13th Annual Conference Proceedings
A COMPARISON OF SOME METHODS TO ANALYZE REPEATED A COMPARISON OF SOME METHODS TO ANALYZE REPEATED
MEASURES ORDINAL CATEGORICAL DATA MEASURES ORDINAL CATEGORICAL DATA
Yaobing Sui
Walter W. Stroup
Follow this and additional works at: https://newprairiepress.org/agstatconference
Part of the Agriculture Commons, and the Applied Statistics Commons
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation Recommended Citation
Sui, Yaobing and Stroup, Walter W. (2001). "A COMPARISON OF SOME METHODS TO ANALYZE REPEATED
MEASURES ORDINAL CATEGORICAL DATA,"
Conference on Applied Statistics in Agriculture
.
https://doi.org/10.4148/2475-7772.1219
This is brought to you for free and open access by the Conferences at New Prairie Press. It has been accepted for
inclusion in Conference on Applied Statistics in Agriculture by an authorized administrator of New Prairie Press. For
more information, please contact cads@k-state.edu.

98 Kansas State University
A COMPARISON OF SOME METHODS TO ANALYZE REPEATED MEASURES
ORDINAL CATEGORICAL DATA
by Yaobing Sui and Walter
W.
Stroup
Department
of
Biometry, University of Nebraska, Lincoln, NE 68583-0712
Abstract: Recent advances in statistical software made possible by the rapid development
of
computer technology in the past decade have made many new procedures available to data
analysts. We focus in this paper on methods for ordinal categorical data with repeated measures
that can be implemented using SAS. These procedures are illustrated using data from an animal
health experiment. The responses, measured as severity
of
symptoms on an ordinal scale, are
recorded for test animals over time. The experiment was designed to estimate treatment and time
effects on the severity
of
symptoms. The data were analyzed with various approaches using
PROC MIXED, PROC NLMIXED, PROC GENMOD, and the GLIMMIX macro.
In
this paper,
we compare the strengths and weaknesses
of
these different methods.
1. Introduction
Consider an experiment in which three treatments are compared. There are r blocks
of
three
animals each, formed using criteria relevant to the experiment. Within each block, one animal is
assigned at random to each treatment. Animals are measured at "week 0," the time the treatments
first take effect, and again at weeks 4 and 12. The variables measured include weight, presence or
absence
of
disease symptoms, and severity
of
symptoms, classified
as
"worse," "no change," or
"better." This type
of
experiment is called a repeated measures experiment. The focus
of
this
paper is on repeated measures analysis of the last two types
of
data in the above list: categorical
data that are either binary or ordinal.
Repeated measures data, also known
as
longitudinal data, come from experiments in which
observations are made on subjects at regular, planned times. These experiments have two or
more treatments and are set up using familiar designs: randomized complete or incomplete block
designs,
if
blocking is appropriate, row-column designs such
as
Latin Squares, when appropriate,
or completely randomized assignment
of
experimental units to treatments when blocking is not
required. Repeated measures designs are widely used throughout the life sciences.
Repeated measures analysis is fairly well understood for normally distributed data, but less
so for categorical data. However, recent developments in methodology and statistical computing
software have greatly increased the number
of
tools available to categorical data analysts. The
purpose
of
this paper is to review these tools, what we currently know
of
their advantages and
disadvantages, and what we still need to learn about them.
Regardless
of
whether the observations are normally distributed, or categorical, or have
some other distribution, a general approach to repeated measures analysis based on the linear
mixed model uses the following general form:
observation
= between subject systematic effects + between subjects random variation
+ within subjects systematic effects + within subjects random variation
For non-normal data, a function of the observation, e.g. the link function in a generalized linear
Conference on Applied Statistics in Agriculture
Kansas State University
New Prairie Press
https://newprairiepress.org/agstatconference/2001/proceedings/9

Applied Statistics in Agriculture
99
mixed model, often replaces the literal observation in the above model.
In the example that begins this section, the between subjects systematic effects are for block
and treatment, the between subjects random effects correspond to block x treatment random
effects - i.e. the between subjects model is identical to the model one would use for a randomized
complete block analysis
of
variance. The within subjects systematic effects are the main effects
of
time and the treatment x time interaction. Within subjects random variation - formally, block
x time within treatment variation - is essentially whatever is left unexplained, i.e. variation
among the measurements at different times on the same experimental unit not explain by
systematic effects already specified.
Formally, for normal errors, the model equation is:
Y
ijk
=J..l+'t i
+rj
+b'j
+W
k
+(
'tW)
ik
+e
ijk
,
where
Yijk
is the observation on the
ith
treatment,
jth
block at the
kth
week (or, more generally,
time),
Il
is the intercept,
-rj
is the
ith
treatment main effect,
rj
is the
jth
block effect, b
ij
is the
ijth
block-treatment random effect, assumed i.i.d.
N(O,
(J~
),
w
k
is the
kth
time main effect, (-rW)jk is
the
ikth
time-treatment interaction effect, and
eijk
is the
ijkth
within subject error. The
eijk
are
assumed multivariate normal and, at least potentially, correlated.
There are two main distinguishing features
of
repeated measures analysis:
1.
The primary objective is to see
if
changes over time are the same for each treatment, i.e. to
assess the time x treatment interaction.
2. The errors,
eijk'
are correlated. Specifically, let e
ij
'=
[e
ijl
, e
ij2
'
•••
, e
ijK
] be the vector
of
within subjects errors, where K is the number
of
time periods observed. Then
e ij - M V N (0,
L)
, where
~
is the covariance matrix reflecting the correlation structure.
The vector
e'=
[e(\,
...
,e("
...
,e;\,,,.,e;,l
is thus distributed with a block-diagonal covariance
matrix, i.e.
e - M V N
(0,/
ar
®
L)
, where a is the number
of
treatments.
With normal errors, repeated measures analysis can be implemented with mixed model software
such as PROC MIXED. The main issues in using PROC MIXED for repeated measures analysis
involve choosing an appropriate covariance model for
~,
realistically approximating the error
degrees
of
freedom for various tests, and adjusting for potential bias
of
standard errors and test
statistics that result from estimating the components
of
~.
Readers seeking more detail on the use
of
PROC MIXED for repeated measures analysis are referred to Littell, et. al. (1996). Carlin and
Louis (1996) discussed covariance model selection issues. Kenward and Roger (1997) discussed
standard error bias and degree
of
freedom issues and presented approximations now available
with PROC MIXED. Guerin and Stroup (2000) presented an extensive simulation study
documenting the small sample behavior
of
PROC MIXED under various options.
Models with non-normal errors, including categorical data, require some modifications.
To
make these modifications more understandable, one can re-express the normal errors model in
terms that make it more amenable to the required changes. Specifically, define the linear mixed
model in terms
of
the distribution
of
the random models effects and in terms
of
the conditional
distribution
of
the observations given the random model effects. Specifically,
y
lu
- M V N ( X
~
+ Z u ,
R)
and u - M V N (0, G ) .
The linear mixed model is a model
of
the conditional mean
of
the observation vector,
y,
given the
random effects,
u. For non-normal data, one adapts the generalized linear model approach used
for categorical models such as logistic regression and log-linear models. Specifically, drop the
Conference on Applied Statistics in Agriculture
Kansas State University
New Prairie Press
https://newprairiepress.org/agstatconference/2001/proceedings/9

100
Kansas
State
University
assumption
of
multivariate normality for ylu and use
XP+Zu
to model a function
of
the
conditional mean, E(ylu), called the link function in generalized linear models. This results in the
generalized linear mixed model (GLMM), widely discussed in the statistical literature
of
the
1980's through the present. See, for example, Breslow and Clayton (1993). The GLMM is thus
described
as
follows:
1.
The
distribution
of
the
random
effects: u - MVN(O,G)
2. The
conditional
distribution
of
the
observations,
y, given the random effects, u. For
categorical data, this distribution is typically assumed Poisson (for log-linear models fit to
contingency tables), binomial (for logistic models),
or
multinomial (for extensions
of
logit
models when there are more than two categories). Quasi-likelihood methods allow the use
of
GLMM-based analysis even when one can only specify the expected value and variance
of
ylu rather than the distribution
per
se.
3. The inverse link, E(Ylu) = h(XP+Zu). The inverse link may be the inverse
of
the link
function, or the inverse link may be a set
of
functions, as is the case for some multinomial
models. With the latter case, there is no one-to-one relationship between the conditional
mean and the link. When a one-to-one relationship does exist, the
GLMM
can be described
in terms
of
the link function, that is, ll=XP+Zu, where ll=g[E(Ylu)] is the link function.
For
the randomized complete block design with repeated measures described above, the
GLMM
would thus be
Yl
ilk
=~+'t
i
+r}
fbi}
+CO
k
+(
'tCO)
ik
where
llijk
is the link function,
g[E(Yi;k
I
bij
)], and the terms
of
the right-hand side
of
the model are
defined as they were with the linear mixed model given previously. Alternatively, one can use the
inverse link
E(Yjjkl
b
jj
) = h[,u+rj +rj
+bij
+Wk
+(rw)jkl
Several options exist in SAS for fitting categorical repeated measures models. PROC
GENMOD can be used to fit log-linear models. For
binomial data only, GENMOD can also fit
certain
GLMM's
for repeated measures using the method
of
generalized estimating equations
(Zeger, et. al. 1988), commonly referred to
as
GEE's. The GLIMMIX macro can also fit repeated
measures
GLMM's
to binomial data. GLIMMIX uses a pseudo-likelihood approach (Wolfinger
and O'Connell, 1993) that is similar to the quasi-likelihood approach described by Breslow and
Clayton (1993), but somewhat more general. GLIMMIX is not as restrictive as the GENMOD
GEE option in terms
of
the types
of
covariance models available. PROC NLMIXED, introduced
in SAS Version
8,
can estimate repeated
GLMM's
for multinomial data in addition to models for
binomial data.
It
uses a maximum likelihood algorithm based on Gaussian quadrature. With
some programming ingenuity, NLMIXED can fit a certain covariance matrices, although
convergence can be an issue with more complex structures.
The next section describes in more detail SAS-based methods useful for categorical
repeated measures data, with a focus on ordinal data. Section 3 presents an example from an
animal health experiment. Section 4 presents some tentative simulation results. These will be
pursued in far more detail in work now in progress.
2. Review
of
Methods
Table 1 shows the data for the experiment described at the beginning
of
Section 1 in
contingency table form. Each cell contains the number
of
animals in a given treatment x week x
response category combination. This section describes the methods available in SAS to analyze
Conference on Applied Statistics in Agriculture
Kansas State University
New Prairie Press
https://newprairiepress.org/agstatconference/2001/proceedings/9

Applied Statistics in Agriculture
101
these data.
The simplest categorical data analysis approach is to compute the Cochran-Mantel-Haenszel
statistic to test treatment x response category association. A statistically significant result
constitutes evidence
of
a treatment effect, assuming that the association does not change over
weeks. SAS PROC FREQ can compute the Cochran-Mantel-Haenszel test.
It
can also compute
the Breslow-Day statistic for no three-way treatment x response category x week association (i.e.
no change in treatment x response association over weeks)
if
the treatment x response table is 2
x 2, but not for the more general case, such as the 3x3 shown here. See Agresti (1996) for a more
in depth discussion
of
the contingency table approach.
Alternatively, the contingency table approach can be implemented using a log-linear model.
For the above example, the log-linear model is
10
g
(A
ijk
)
=)..l
+ 't i +
CD
j + (
'tCD
)
ij
+c k + ( 'tc )
ik
+ (
'tCDc
)
ijk
where A
ijk
is the expected count
of
the
ijkth
treatment x week x response category combination,
and
r,
cu,
and c refer to treatment, week, and response category effects, respectively. The two
effects
of
primary interest are the three-way association effects and, assuming the three-way
effects, (TUJc);jk' are zero, the two-way treatment x response category effects. The test
of
110:
all
(TUJc)ijk=O
is equivalent to the Breslow-Day test, but more general because it is not restricted to
2x2 treatmentxresponse category cases. The test
of
110:
all (TC)ij=O is equivalent to the Cochran-
Mantel-Haenszel test. PROC GENMOD can do all the required computations for the log-linear
model.
While the log-linear model is easy to compute, the contingency table approach may not take
correlation among repeated measurements on the same experimental unit into account
realistically. Agresti (1996) presents the logic
of
the contingency table approach when there are
two times, but the logic does not necessarily extend to three or more times. Approaches using
GEE's
or other GLMM methods with more flexibility in specifying the covariance structure are,
at least in theory, preferable.
In
SAS,for
binary data only, GEE's can be implemented using the REPEATED option in
PROC GENMOD. This approach is limited in that it assumes no random model effects. The
model thus
llijk
=)..l+'t i
+rj
+CD
k +(
'tCD)
ik
where
llijk
is usually either the logit or probit link, and
1",
r,
and curefer to treatment, block, and
week effects, respectively. The logit link is defined as
logit(1t
ijk
)=IOg(
1tijk
),
where
TI
ijk
is the
1-1t ijk
probability
of
the outcome of interest occurring for the
ijkth
treatmentxblockxweek combination.
The probit link is defined as probit(TI
ijk
)=
<I>
-1
(7t
ijk)
,where
<1>-1
is the inverse cumulative
standard normal distribution. The observations are assumed to have a covariance matrix
R=DPD,
(
7ti'k
(1-
7t
n
)]
where D=diag J
J,
and n
ijk
is the number
of
Bernoulli trials observed on the ijk
th
n
ijk
treatmentxblockxweek combination. The form
of
D given here is specific to the binomial
distribution. In general, D a diagonal matrix whose elements are the variance functions with for
each treatmentxblockxweek combination. P is a working correlation matrix. Working correlation
matrices are not true correlation matrices, but their structure follows common correlated error
Conference on Applied Statistics in Agriculture
Kansas State University
New Prairie Press
https://newprairiepress.org/agstatconference/2001/proceedings/9

Citations
More filters
Journal ArticleDOI

Analyzing Binomial Data in a Split-Plot Design: Classical Approach or Modern Techniques?

TL;DR: This study shows that analyzing random effects properly is more important than adjusting the analysis for non-normality in analyzing binomial data in split-plot designs.
Journal ArticleDOI

Small sample power characteristics of generalized mixed model procedures for binary repeated measures data using sas

TL;DR: In this paper, power characteristics of Bernoulli response variables are compared via simulation for several scenarios involving relatively small repeated measures experiments, and the results show that the power properties of these variables are similar to those of the generalized mixed model theory.
Journal ArticleDOI

Analyzing binomial data in a split-plot design: classical approaches or modern techniques?

TL;DR: In this article, seven statistical methods for testing whole-plot and subplot treatment effects using mixed, generalized linear, or generalized linear mixed models are compared for the size and power of the tests.
References
More filters
Book

SAS System for Mixed Models

Book

An introduction to categorical data analysis

Alan Agresti
TL;DR: In this paper, the authors present a tour of categorical data analysis for Contingency Tables and Logit and Loglinear models for contingency tables, as well as generalized linear models for Matched Pairs.
Journal ArticleDOI

Models for longitudinal data: a generalized estimating equation approach.

TL;DR: This article discusses extensions of generalized linear models for the analysis of longitudinal data in which heterogeneity in regression parameters is explicitly modelled and uses a generalized estimating equation approach to fit both classes of models for discrete and continuous outcomes.
Journal ArticleDOI

Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood

TL;DR: A scaled Wald statistic is presented, together with an F approximation to its sampling distribution, that is shown to perform well in a range of small sample settings and has the advantage that it reproduces both the statistics and F distributions in those settings where the latter is exact.