scispace - formally typeset
Open AccessJournal ArticleDOI

Analysis of Covariance With Qualitative Data

Gary Chamberlain
- 01 Jan 1980 - 
- Vol. 47, Iss: 1, pp 225-238
TLDR
In this article, the problem of finding consistent estimators in other models is non-trivial, however, since the number of incidental parameters is increasing with sample size, and it is well known that analysis of covariance in the linear regression model does not have this consistency property.
Abstract
This paper deals with data that has a group structure. A simple example in the context of a linear regression model is E(yitlx, 1S, ar) = P'xit + ai (i = 1, ...,9 N; t = 1, ... T), where there are T observations within each of N groups. The ai are group specific parameters. Our primary concern is with the estimation of f3, a parameter vector common to all groups. The role of the ai is to control for group specific effects; i.e. for omitted variables that are constant within a group. The regression function that does not condition on the group will not in general identify 1: E(yitlx, 13) 0 1'xit. In this case there is an omitted variable bias. An important application is generated by longitudinal or panel data, in which there are two or more observations on each individual. Then the group is the individual, and the ai capture individual differences. If these person effects are correlated with x, then a regression function that fails to control for them will not identify f. In another important application the group is a family, with observations on two or more siblings within the family. Then the ai capture omitted variables that are family specific, and they give a concrete representation to family background. We shall assume that observations from different groups are independent. Then the ai are incidental parameters (Neyman and Scott (1948)), and 0, which is common to the independent sampling units, is a vector of structural parameters. In the application to sibling data, T is small, typically T= 2, whereas there may be a large number of families. Small T and large N are also characteristic of many of the currently available longitudinal data sets. So a basic statistical issue is to develop an estimator for j that has good properties in this case. In particular, the estimator ought to be consistent as N -> ac for fixed T. It is well-known that analysis of covariance in the linear regression model does have this consistency property. The problem of finding consistent estimators in other models is non-trivial, however, since the number of incidental parameters is increasing with sample size. We shall work with the following probability model: Yit is a binary variable with

read more

Content maybe subject to copyright    Report

NBER WORKING PAPER SERIES
Analysis
of Covariance with Qualitative Data
Gary Chamberlain
Working Paper No. 325
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge MA 02138
March 1979
The research reported here is part of the NBER's research
program in Labor Economics. Any opinions expressed are
those of the author and not those of the National Bureau
of Economic Research. Financial Support was provided by
the National Science Foundation (Grant No. SOC77'—l562').

NBER Working Paper 325
March
1979
Analysis of Covariance with Qualitative Data
ABSTRACT
In data with a group structure, incidental parameters are
included
to control for
missing variables. Applications include
longitudinal
data and
sibling
data. In general, the joint max-
imurn likelihood estimator of the structural parameters is not
consistent as the number of groups
increases,
with a fixed number
of observations per group. Instead a conditional likelihood
function is maximized, conditional on sufficient statistics for
the incidental parameters. In the logit case, a standard condi-
tional
logit program can be used. Another solution is a random
effects rwdel, in which the distribution of the incidental par-
ameters may depend upon the exogenous variables.
Gary Chamberlain
Department of Economics
Harvard University
Littauer Center
Cambridge,
MA 02138
6l7/L95_3203

ANALYSIS OF COVARIANCE WITH QUALITATIVE DATA
by
Gary Chamberlain
Harvard University
1. Introduction
This paper deals with data that has a group structure. A simple
example in the context of a linear regression model is
E(yjtlx, ,
ct)
=
'x
+ c (i=l..
.
., N;
t=l, .
. ., T),
where there are T observations within each of N
groups. The
are group
specific parameters. Our primary concern is with the estimation of ,
a
parameter vector conunon to all groups. The role of the
is to control
for group specific effects; i.e., for omitted variables that
are constant
within a group. The regression function that does not condition on the
group will not in general flentlfy :
E(y1Jx
In this case there is an omitted variable bias.
An important application is generated by longitudinal or panel data,
in which there are two or more observations on each individual. Then the
group is the individual, and the c capture iiicH.vidual differences. If
these person effects are correlated with x, then a regression function that
fails to control for them will not identify 3. In another important
application
the group is a family, with observations on two or more
siblings within the
family. Then the ct. capture omitted variables that are family specific,
and t1iy give a concrete representation 10 family background.
We shall assume that observations from different groups are independent.

2
Then the c. are incidental parameters (Neyman and Scott 13111, and ,
which
is common to the independent sampling units, is a vector of structural
parameters. In the application to sibling data, T is small,
typically
T=2, whereas there may be a large number of families. Small T and large N
are also characteristic of many of the currently available longitudinal
data sets. So a basic statistical issue is to develop an estimator for 3
that has
good properties in this case. In particular, the estimator ought
to be consistent as N -
for fixed T.
It is well—known that analysis of covariance in the linear regression
model does have this consistency property. The problem of finding consistent
estimators in other models is non—trivial, however, since the number of
incidental parameters is increasing with sample size. We shall work with
the following probability model: y. is a binary variable with
Prob(y =
lix,
'
=
+
ci.),
where F( )
is
a cumulative distribution function such as a unit normal or
a logistic. For example, y may indicate labor force participation,
unemployment, job change, marital status, health status, or a college
degree. Section 2 considers maximum likelihood (ML) estimation of the fixed
effects version of this model. A simple algorithm is available which
involves a weighted analysis of covariance at each iteration. The ML
estimator of
is not consistent (for fixed T), however, and we present a
simple example with T=2 in which the I'IL estimator of
converges to 2.
Section 3 presents one solution to this problem by working with a
conditional likelihood function that conditios on sufficient statistics
for the incidental parameters. This likelihood function does not depend
upon the incidental paraneters, and hence standard asymptotic theory for
maximum likelihood estimation applies. This approach is applied to a

3
multinomial logit model for grouped data and to the inultivariate log—linear
probability model. Section 4 develops an alternative approach, based on
a random effects model in which the incidental parameters are assumed to
follow a distribution. The important point here is that the distribution
of the c is not assumed to be independent of x; otherwise the problem of
omitted variable bias would be assumed away from the beginning. Throughout
the paper we shall use the familiar linear regression case to guide the
exposition.
2. Fixed Effects: Maximization of the Joint Likelihood Function
We shall begin with a brief review of the linear regression case.
Let
=
+
a. +
where
is i.i.d. N(O,
So in addition to assuming independence across
the groups, we are assuming that observations within a
group are independent
as well, conditional on the group effects. The dependence of different
observations within a group is assumed to be due to their common dependence
on the group specific aj. More general forms of dependence are, of course,
possible; for example, there could be serial correlation in addition to
the c in the longitudinal case.
Maximum likelihood for this model is simply a multiple regression of
y on x and a set of group indicator dummy variables. A useful computational
simplification is that the ML estimator of
can be obtained from a
regression of y—y. on iti' where y. and .
are
group means
Iy1/T).
In the case of T=2, this is equivalent to a regression of
y.2—y11 on
x —x. .
Since
we have
..i2 -.il
i2 —
y11
=
i2
il
+
£11,

Citations
More filters
Book

Econometric Analysis of Cross Section and Panel Data

TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).
Journal ArticleDOI

How Important is Methodology for the estimates of the determinants of Happiness

TL;DR: This paper developed a conditional estimator for the fixed-effect ordered logit model and found that assuming ordinality or cardinality of happiness scores makes little difference, whilst allowing for fixed-effects does change results substantially.
Journal ArticleDOI

Discrete-Time Methods for the Analysis of Event Histories

TL;DR: The history of an individual or group can always be characterized as a sequence of events as discussed by the authors, and it is surely the business of sociology to explain and predict the occurrence of such events.
Posted Content

The Decision to Export in Colombia: An Empirical Model of Entry with Sunk Costs

TL;DR: In this paper, the authors quantified the effect of prior exporting experience on the decisions of Colombian manufacturing plants to participate in foreign markets and developed a dynamic discrete-choice model of exporting behavior that separates the roles of profit heterogeneity and sunk entry costs in explaining plants' exporting status.
Journal ArticleDOI

Income and well-being: an empirical analysis of the comparison income effect

TL;DR: In this paper, an empirical analysis of the importance of comparison income for individual well-being or happiness is presented, where the authors use a self-reported measure of satisfaction with life as a measure of individual wellbeing.
References
More filters
Book

Discrete multivariate analysis: theory and practice

TL;DR: Discrete Multivariate Analysis is a comprehensive text and general reference on the analysis of discrete multivariate data, particularly in the form of multidimensional tables, and contains a wealth of material on important topics.
Book

analysis of binary data

David Cox, +1 more
TL;DR: Binary response variables special logistical analyses some complications some related approaches more complex responses.