NBER WORKING PAPER SERIES
Analysis
of Covariance with Qualitative Data
Gary Chamberlain
Working Paper No. 325
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge MA 02138
March 1979
The research reported here is part of the NBER's research
program in Labor Economics. Any opinions expressed are
those of the author and not those of the National Bureau
of Economic Research. Financial Support was provided by
the National Science Foundation (Grant No. SOC77'—l562').
NBER Working Paper 325
March
1979
Analysis of Covariance with Qualitative Data
ABSTRACT
In data with a group structure, incidental parameters are
included
to control for
missing variables. Applications include
longitudinal
data and
sibling
data. In general, the joint max-
imurn likelihood estimator of the structural parameters is not
consistent as the number of groups
increases,
with a fixed number
of observations per group. Instead a conditional likelihood
function is maximized, conditional on sufficient statistics for
the incidental parameters. In the logit case, a standard condi-
tional
logit program can be used. Another solution is a random
effects rwdel, in which the distribution of the incidental par-
ameters may depend upon the exogenous variables.
Gary Chamberlain
Department of Economics
Harvard University
Littauer Center
Cambridge,
MA 02138
6l7/L95_3203
ANALYSIS OF COVARIANCE WITH QUALITATIVE DATA
by
Gary Chamberlain
Harvard University
1. Introduction
This paper deals with data that has a group structure. A simple
example in the context of a linear regression model is
E(yjtlx, ,
ct)
=
'x
+ c (i=l..
.
., N;
t=l, .
. ., T),
where there are T observations within each of N
groups. The
are group
specific parameters. Our primary concern is with the estimation of ,
a
parameter vector conunon to all groups. The role of the
is to control
for group specific effects; i.e., for omitted variables that
are constant
within a group. The regression function that does not condition on the
group will not in general flentlfy :
E(y1Jx
In this case there is an omitted variable bias.
An important application is generated by longitudinal or panel data,
in which there are two or more observations on each individual. Then the
group is the individual, and the c capture iiicH.vidual differences. If
these person effects are correlated with x, then a regression function that
fails to control for them will not identify 3. In another important
application
the group is a family, with observations on two or more
siblings within the
family. Then the ct. capture omitted variables that are family specific,
and t1iy give a concrete representation 10 family background.
We shall assume that observations from different groups are independent.
2
Then the c. are incidental parameters (Neyman and Scott 13111, and ,
which
is common to the independent sampling units, is a vector of structural
parameters. In the application to sibling data, T is small,
typically
T=2, whereas there may be a large number of families. Small T and large N
are also characteristic of many of the currently available longitudinal
data sets. So a basic statistical issue is to develop an estimator for 3
that has
good properties in this case. In particular, the estimator ought
to be consistent as N -
for fixed T.
It is well—known that analysis of covariance in the linear regression
model does have this consistency property. The problem of finding consistent
estimators in other models is non—trivial, however, since the number of
incidental parameters is increasing with sample size. We shall work with
the following probability model: y. is a binary variable with
Prob(y =
lix,
'
=
+
ci.),
where F( )
is
a cumulative distribution function such as a unit normal or
a logistic. For example, y may indicate labor force participation,
unemployment, job change, marital status, health status, or a college
degree. Section 2 considers maximum likelihood (ML) estimation of the fixed
effects version of this model. A simple algorithm is available which
involves a weighted analysis of covariance at each iteration. The ML
estimator of
is not consistent (for fixed T), however, and we present a
simple example with T=2 in which the I'IL estimator of
converges to 2.
Section 3 presents one solution to this problem by working with a
conditional likelihood function that conditios on sufficient statistics
for the incidental parameters. This likelihood function does not depend
upon the incidental paraneters, and hence standard asymptotic theory for
maximum likelihood estimation applies. This approach is applied to a
3
multinomial logit model for grouped data and to the inultivariate log—linear
probability model. Section 4 develops an alternative approach, based on
a random effects model in which the incidental parameters are assumed to
follow a distribution. The important point here is that the distribution
of the c is not assumed to be independent of x; otherwise the problem of
omitted variable bias would be assumed away from the beginning. Throughout
the paper we shall use the familiar linear regression case to guide the
exposition.
2. Fixed Effects: Maximization of the Joint Likelihood Function
We shall begin with a brief review of the linear regression case.
Let
=
+
a. +
where
is i.i.d. N(O,
So in addition to assuming independence across
the groups, we are assuming that observations within a
group are independent
as well, conditional on the group effects. The dependence of different
observations within a group is assumed to be due to their common dependence
on the group specific aj. More general forms of dependence are, of course,
possible; for example, there could be serial correlation in addition to
the c in the longitudinal case.
Maximum likelihood for this model is simply a multiple regression of
y on x and a set of group indicator dummy variables. A useful computational
simplification is that the ML estimator of
can be obtained from a
regression of y—y. on iti' where y. and .
are
group means
Iy1/T).
In the case of T=2, this is equivalent to a regression of
y.2—y11 on
x —x. .
Since
we have
..i2 -.il
i2 —
y11
=
i2
—
il
+
—
£11,