scispace - formally typeset
Open AccessJournal ArticleDOI

Robust Inference with Multi-way Clustering

Reads0
Chats0
TLDR
The authors proposed a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM that enables cluster-robust inference when there is two-way or multiway clustering that is nonnested.
Abstract
In this article we propose a variance estimator for the OLS estimator as well as for nonlinear estimators such as logit, probit, and GMM. This variance estimator enables cluster-robust inference when there is two-way or multiway clustering that is nonnested. The variance estimator extends the standard cluster-robust variance estimator or sandwich estimator for one-way clustering (e.g., Liang and Zeger 1986; Arellano 1987) and relies on similar relatively weak distributional assumptions. Our method is easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends the state–year effects example of Bertrand, Duflo, and Mullainathan (2004) to two dimensions; and by application to studies in the empirical literature where two-way clustering is present.

read more

Content maybe subject to copyright    Report

TECHNICAL WORKING PAPER SERIES
ROBUST INFERENCE WITH MULTI-WAY CLUSTERING
A. Colin Cameron
Jonah B. Gelbach
Douglas L. Miller
Technical Working Paper 327
http://www.nber.org/papers/T0327
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge, MA 02138
September 2006
This paper has benefitted from presentations at the University of California - Berkely, the University of
California - Riverside, and Dartmouth College. Miller gratefully acknowledges funding from the National
Institute on Aging, through Grant Number T32-AG00186 to the NBER.
©2006 by A. Colin Cameron, Jonah B. Gelbach, and Douglas L. Miller. All rights reserved. Short sections
of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit,
including © notice, is given to the source.

Robust Inference with Multi-way Clustering
A. Colin Cameron, Jonah B. Gelbach, and Douglas L. Miller
NBER Technical Working Paper No. 327
September 2006
JEL No. C14, C21, C52
ABSTRACT
In this paper we propose a new variance estimator for OLS as well as for nonlinear estimators such
as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or
multi-way clustering that is non-nested. The variance estimator extends the standard cluster-robust
variance estimator or sandwich estimator for one-way clustering (e.g. Liang and Zeger (1986),
Arellano (1987)) and relies on similar relatively weak distributional assumptions. Our method is
easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust
standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo
analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends
the state-year effects example of Bertrand et al. (2004) to two dimensions; and by application to two
studies in the empirical public/labor literature where two-way clustering is present.
A. Colin Cameron
Department of Economics
UC Davis
Davis, CA 95616
accameron@ucdavis.edu
Jonah Gelbach
Department of Economics
University of Maryland
College Park, MD 20742
and
College of Law
Florida State University
425 West Jefferson Street
Tallahassee, FL 32303
and NBER
gelbach@glue.umd.edu
Douglas Miller
Department of Economics
UC Davis
Davis, CA 95616
and NBER
dlmiller@ucdavis.edu

1. Introduction
A key component of empirical research is conducting accurate statistical inference. One
challenge to this is the possibility of clustered (or non-independent) errors. In this
paper we propose a new variance estimator for commonly used estimators, such as OLS,
probit, and logit, that provides cluster-robust inference when there is multi-way non-
nested clustering. The variance estimator extends the standard cluster-robust variance
estimator for one-way clustering, and relies on similar relativ ely weak distributional
assumptions. Our method is easily implemented in any statistical pac kage that provides
cluster-robust standard errors with one-way clustering.
1
Controllingforclusteringcanbeveryimportant,asfailuretodosocanleadto
massively under-estimated standard errors and consequent ov er-rejection using standard
hypothesis tests. Moulton (1986, 1990) demonstrated that this problem arose in a much
wider range of settings than had been appreciated by microeconometricians. More
recently Bertrand, Duo and Mullainathan (2004) and Kezdi (2004) emphasized that
with state-year panel or repeated cross-section data, clustering can be present even after
including state and year eects and valid inference requires controlling for clustering
within state. These papers, like most previous analysis, focus on one-way clustering.
In this paper we consider inference when there is nonnested multi-way clustering.
The method is useful in many applications, including:
1. Clustering due to sample design ma y be combined with grouping on a key regres-
sor for reasons other than sample design. For example, clustering may occur at
the level of a Primary Sampling Unit as well as at the level of an industry-level
regressor.
2. The survey design of the Current Population Survey (CPS) uses a rotating panel
structure, with households resurveyed for a number of months. Researchers using
data on households or individuals and concerned about within state-year clustering
(perhaps because of important state-year variables or instruments) should also
account for household-level clustering across the two years of the panel structure.
Then they need to account for clustering across both dimensions.
3. In a state-year panel setting, we may want to cluster at the state level to permit
valid inference if there is within-state autocorrelation in the errors. If there is also
geographic-based spatial correlation, a similar issue may be at play with respect
to the within-year cross-state errors (Conley 1999). In this case, researchers may
wish to cluster at the year lev el as well as at the state level.
1
An ado le for multi-way clustering in Stata is available at the following website:
www.econ.ucdavis.edu/faculty/dlmiller/statales/index.htm
2

4. More generally this situation arises when there is clustering at both a cross-section
level and temporal level. For example, nance applications may call for clustering
at the rm level and at the time (e.g., day) lev el. Petersen (2006) compares a
number of approaches for OLS estimation in this panel setting.
2
5. Even in a cross-section study clustering may arise at several levels simultaneously.
For example a model may have geographic-level regressors, industry-level regres-
sors and occupation-level regressors.
6. Clustering may arise due to discrete regressors. Moulton (1986) considered infer-
ence in this case, using an error components model. More recently, Card and Lee
(2004) argue that in a regression discontinuity framework where the treatment-
determining variable is discrete, the observations should be clustered at the lev el
of the right-hand side variable. If additionally in terest lies in a “primary” dimen-
sion of clustering (e.g., state or village), then there is clustering in more than one
dimension.
Our method builds on that for one-way cluster-robust inference. Initial con trols for
one-way clustering relied on strong assumptions on the dgp for the error term, such
as a one-way random eects error model. This has been superseded by computation
of “cluster-robust” standard errors that rely on m uch weaker assumptions errors are
independent but not identically distributed across clusters and can have quite general
patterns of within cluster correlation and heteroskedasticity. These standard errors
generalize those of White (1980) for independent heteroskedastic errors. Key references
include White (1984) for a multivariate dependen t variable, Liang and Zeger (1986)
for estimation in a generalized estimating equations setting, and Arellano (1987) and
Hansen (2005) for the xed eects estimator in linear panel models. Wooldridge (2003)
provides a survey, and Wooldridge (2002) and Cameron and Trivedi (2005) give textbook
treatments.
For two-way or multi-way clustering that is nested, one simply clusters at the highest
level of aggregation. For example, with individual-level data and clustering on both
household and state one should cluster on state. Pepper (2002) provides an example.
If multi-way clustering is non-nested, the existing approach is to specify a multi-
way error components model with iid errors. Moulton (1986) considered clustering due
2
WethankMitchellPetersenforsendingusacopyofhispaper. Oneofthemethodsheusesisthat
prop osed in this pap e r for OLS with two-way clustering. Petersen cites as his source for this m ethod a
pap er by Thompson (2005) that we were unaware of until after working out our theoretical results and
doing substantial Monte Carlo work. Sometime after we describ ed our work to Petersen, he informed us
that Thompson (2006) had been posted on the internet. Thompson (2006) correctly derives the formula
for O LS in the two-way case, but the theoretical discussion does not address the general multi-way case
and nonlinear estimators that we also consider. Thompson’s Monte Carlo results are basically consistent
with ours, though they are somewhat narrower in scop e .
3

to grouping of three regressors (schooling, age and weeks worked) in a cross-section
log earnings regression. Davis (2002) modelled lm attendance data clustered by lm,
theater and time and provided a quite general way to implemen t feasible GLS even
with clustering in many dimensions. But these models impose strong assumptions,
including homoskedasticity and errors equicorrelated within cluster. And even the two-
way random eects model for linear regression is typically not included in standard
econometrics packages.
In this paper we take a less parametric cluster-robust approach that generalizes
one-way cluster-robust standard errors to the non-nested multi-way clustering case.
Our new estimator is easy to implement. In the two-way clustering case, we obtain
three dierent cluster-robust “variance” matrices for the estimator by one-way clustering
in, respectively, the rst dimension, the second dimension, and by the intersection of the
rst and second dimensions (sometimes referred to as rst-by-second, as in “state-by-
year”, clustering). Then w e add the rst two variance matrices and subtract the third.
In the three-way clustering case there is an analogous formula, with seven one-way
cluster robust variance matrices computed and com bined.
The methods and supporting theory for two-way and multi-way clustering and for
both OLS and quite general nonlinear estimators are presented in Section 2. Like the
one-way cluster-robust method, our methods assume that the number of clusters goes
to innity. This assumption does become more binding with multi-way clustering. For
example, in the two-way case it is assumed that min (G, H) →∞, where there are G
clusters in dimension 1 and H clusters in dimension 2. In Section 3 we present two
dierent Monte Carlo experimen ts. The rst is based on a two-way random eects
model and some extensions of that model. The second follows the general approach of
Bertrand et al. ( 2004) in investigating a placebo law in an earnings regression, except
that in our example the induced error dependence is two-way (over both states and
years) rather than one-way. Section 4 presents two empirical examples, Hersch (1998)
using OLS and Gruber and Madrian (1995) using both probit and OLS, where we
contrast results obtained using conventional one-way clustering to those allowing for
t wo-w ay clustering.
3
Section 5 concludes.
2. Cluster-Robust Inference
This section emphasizes the OLS estimator, for simplicity. We begin with a review of
one-way clustering, before considering in turn two-way clustering and multi-way clus-
tering. The section concludes with extension from OLS to m-estimators, suc h as probit
and logit, and GMM estimators.
3
We thank M arianne Bertrand, E sther Duo, Sendhil M ullainathan, and Joni H ersch for assisting
us in re plicating their data sets.
4

Citations
More filters
Journal ArticleDOI

A Practitioner’s Guide to Cluster-Robust Inference

TL;DR: This work considers statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters, when the number of clusters is large and default standard errors can greatly overstate estimator precision.
Journal ArticleDOI

Bootstrap-Based Improvements for Inference with Clustered Errors

TL;DR: In this article, the authors investigate inference using cluster bootstrap-t procedures that provide asymptotic refinement, including the example of Bertrand, Duflo, and Mullainathan.
Journal ArticleDOI

Why Do U.S. Firms Hold So Much More Cash than They Used To

TL;DR: The authors investigated how the cash holdings of U.S. firms have evolved since 1980 and whether this evolution can be explained by changes in known determinants of cash holdings and found no consistent evidence that agency conflicts contribute to the increase.
Journal ArticleDOI

Credit Spreads and Business Cycle Fluctuations

TL;DR: In this paper, the authors examined the relationship between credit spreads and economic activity, by constructing a credit spread index based on an extensive data set of prices of outstanding corporate bonds trading in the secondary market and found that the predictive content of credit spreads for economic activity is due primarily to movements in the excess bond premium.
Journal ArticleDOI

On making causal claims: A review and recommendations

TL;DR: In this article, the authors present methods that allow researchers to test causal claims in situations where randomization is not possible or when causal interpretation could be confounded; these methods include fixed-effects panel, sample selection, instrumental variable, regression discontinuity, and difference-in-differences models.
References
More filters
Book

Econometric Analysis of Cross Section and Panel Data

TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).
Journal ArticleDOI

A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

Halbert White
- 01 May 1980 - 
TL;DR: In this article, a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic is presented, which does not depend on a formal model of the structure of the heteroSkewedness.
Journal ArticleDOI

Longitudinal data analysis using generalized linear models

TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Journal ArticleDOI

How Much Should We Trust Differences-In-Differences Estimates?

TL;DR: In this article, the authors randomly generate placebo laws in state-level data on female wages from the Current Population Survey and use OLS to compute the DD estimate of its "effect" as well as the standard error of this estimate.
MonographDOI

Microeconometrics: Methods and Applications

TL;DR: This chapter discusses models for making pseudo-random draw, which combines asymptotic theory, Bayesian methods, and ML and NLS estimation with real-time data structures.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What are the contributions in this paper?

In this paper the authors propose a new variance estimator for OLS as well as for nonlinear estimators such as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or multi-way clustering that is non-nested. 

Commonly-used examples of nonlinear estimators to which this method can be applied are nonlinear-least squares, just-identified instrumental variables estimation, logit, probit and Poisson. 

The Wald test based on assuming iid errors is exactly T distributed with (GH − 3) degrees of freedom under the current dgp, so that even in the smallest design with G = H = 10 the theoretical rejection rate is 5.3% (since Pr [|t| > 1.96|t ∼ T (97)] = 0.053), still quite close to 5%. 

One possibility is to adapt the random effects model to allow dampening serial correlation in the error, similar to the dgp used by Kezdi (2004) and Hansen (2005) in studying one-way clustering, with addition of a common shock. 

Then rejection rates may exceed 5%, as even with a Gaussian dgp, the Wald test statistic has a distribution fatter than the standard normal, due to the need to estimate the unknown error variance (even if the standard error estimate is unbiased). 

For methods 1-3 with larger designs, specifically G ×H > 1600, the authors use only 1,000 simulations due to computational cost; the 95% confidence interval is (3.6%, 6.4%). 

7In a variety of Monte Carlo experiments and replications, the authors find that accounting for multi-way clustering can have important quantitative impacts on the estimated standard errors. 

A practical matter that can arise when implementing the two-way robust estimatoris that the resulting variance estimate bV[bβ] may have negative elements on the diagonal. 

The second follows the general approach of Bertrand et al. (2004) in investigating a placebo law in an earnings regression, except that in their example the induced error dependence is two-way (over both states and years) rather than one-way. 

The maximum possible increase in standard errors due to error correlation at the household level is about forty percent (corresponding to a doubling of the variance estimate: √ 2 = 1.41). 

The N × N selection matrix SGH may be large in some problems, however, and even if N is manageable many users will prefer to use readily available software that calculates cluster-robust standard errors for one-way clustering.