What are some examples of nonlinear estimators to which this method can be applied?

Commonly-used examples of nonlinear estimators to which this method can be applied are nonlinear-least squares, just-identified instrumental variables estimation, logit, probit and Poisson.

What is the Wald test for iid errors?

The Wald test based on assuming iid errors is exactly T distributed with (GH − 3) degrees of freedom under the current dgp, so that even in the smallest design with G = H = 10 the theoretical rejection rate is 5.3% (since Pr [|t| > 1.96|t ∼ T (97)] = 0.053), still quite close to 5%.

What is the way to reduce the rejection rate of the random effects model?

One possibility is to adapt the random effects model to allow dampening serial correlation in the error, similar to the dgp used by Kezdi (2004) and Hansen (2005) in studying one-way clustering, with addition of a common shock.

What is the reason for the rejection rate of the Wald test statistic?

Then rejection rates may exceed 5%, as even with a Gaussian dgp, the Wald test statistic has a distribution fatter than the standard normal, due to the need to estimate the unknown error variance (even if the standard error estimate is unbiased).

What is the 95% confidence interval for the Wald test?

For methods 1-3 with larger designs, specifically G ×H > 1600, the authors use only 1,000 simulations due to computational cost; the 95% confidence interval is (3.6%, 6.4%).

What is the effect of multi-way clustering?

7In a variety of Monte Carlo experiments and replications, the authors find that accounting for multi-way clustering can have important quantitative impacts on the estimated standard errors.

What is the problem with the two-way robust estimatoris?

A practical matter that can arise when implementing the two-way robust estimatoris that the resulting variance estimate bV[bβ] may have negative elements on the diagonal.

How much is the maximum possible increase in standard errors due to error correlation at the household level?

The maximum possible increase in standard errors due to error correlation at the household level is about forty percent (corresponding to a doubling of the variance estimate: √ 2 = 1.41).

What is the simplest way to calculate the cluster-robust standard errors?

The N × N selection matrix SGH may be large in some problems, however, and even if N is manageable many users will prefer to use readily available software that calculates cluster-robust standard errors for one-way clustering.

(Open Access) Robust Inference with Multi-way Clustering (2011) | A. Colin Cameron

Q: What are the contributions in this paper?

In this paper the authors propose a new variance estimator for OLS as well as for nonlinear estimators such as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or multi-way clustering that is non-nested.

TECHNICAL WORKING PAPER SERIES

ROBUST INFERENCE WITH MULTI-WAY CLUSTERING

A. Colin Cameron

Jonah B. Gelbach

Douglas L. Miller

Technical Working Paper 327

http://www.nber.org/papers/T0327

NATIONAL BUREAU OF ECONOMIC RESEARCH

1050 Massachusetts Avenue

Cambridge, MA 02138

September 2006

This paper has benefitted from presentations at the University of California - Berkely, the University of

California - Riverside, and Dartmouth College. Miller gratefully acknowledges funding from the National

Institute on Aging, through Grant Number T32-AG00186 to the NBER.

of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit,

including © notice, is given to the source.

Robust Inference with Multi-way Clustering

A. Colin Cameron, Jonah B. Gelbach, and Douglas L. Miller

NBER Technical Working Paper No. 327

September 2006

JEL No. C14, C21, C52

ABSTRACT

In this paper we propose a new variance estimator for OLS as well as for nonlinear estimators such

as logit, probit and GMM, that provcides cluster-robust inference when there is two-way or

multi-way clustering that is non-nested. The variance estimator extends the standard cluster-robust

variance estimator or sandwich estimator for one-way clustering (e.g. Liang and Zeger (1986),

Arellano (1987)) and relies on similar relatively weak distributional assumptions. Our method is

easily implemented in statistical packages, such as Stata and SAS, that already offer cluster-robust

standard errors when there is one-way clustering. The method is demonstrated by a Monte Carlo

analysis for a two-way random effects model; a Monte Carlo analysis of a placebo law that extends

the state-year effects example of Bertrand et al. (2004) to two dimensions; and by application to two

studies in the empirical public/labor literature where two-way clustering is present.

A. Colin Cameron

Department of Economics

UC Davis

Davis, CA 95616

accameron@ucdavis.edu

Jonah Gelbach

Department of Economics

University of Maryland

College Park, MD 20742

and

College of Law

Florida State University

425 West Jefferson Street

Tallahassee, FL 32303

and NBER

gelbach@glue.umd.edu

Douglas Miller

Department of Economics

UC Davis

Davis, CA 95616

and NBER

dlmiller@ucdavis.edu

1. Introduction

A key component of empirical research is conducting accurate statistical inference. One

challenge to this is the possibility of clustered (or non-independent) errors. In this

paper we propose a new variance estimator for commonly used estimators, such as OLS,

probit, and logit, that provides cluster-robust inference when there is multi-way non-

nested clustering. The variance estimator extends the standard cluster-robust variance

estimator for one-way clustering, and relies on similar relativ ely weak distributional

assumptions. Our method is easily implemented in any statistical pac kage that provides

cluster-robust standard errors with one-way clustering.

Controllingforclusteringcanbeveryimportant,asfailuretodosocanleadto

massively under-estimated standard errors and consequent ov er-rejection using standard

hypothesis tests. Moulton (1986, 1990) demonstrated that this problem arose in a much

wider range of settings than had been appreciated by microeconometricians. More

recently Bertrand, Duﬂo and Mullainathan (2004) and Kezdi (2004) emphasized that

with state-year panel or repeated cross-section data, clustering can be present even after

including state and year eﬀects and valid inference requires controlling for clustering

within state. These papers, like most previous analysis, focus on one-way clustering.

In this paper we consider inference when there is nonnested multi-way clustering.

The method is useful in many applications, including:

1. Clustering due to sample design ma y be combined with grouping on a key regres-

sor for reasons other than sample design. For example, clustering may occur at

the level of a Primary Sampling Unit as well as at the level of an industry-level

regressor.

2. The survey design of the Current Population Survey (CPS) uses a rotating panel

structure, with households resurveyed for a number of months. Researchers using

data on households or individuals and concerned about within state-year clustering

(perhaps because of important state-year variables or instruments) should also

account for household-level clustering across the two years of the panel structure.

Then they need to account for clustering across both dimensions.

3. In a state-year panel setting, we may want to cluster at the state level to permit

valid inference if there is within-state autocorrelation in the errors. If there is also

geographic-based spatial correlation, a similar issue may be at play with respect

to the within-year cross-state errors (Conley 1999). In this case, researchers may

wish to cluster at the year lev el as well as at the state level.

An ado ﬁle for multi-way clustering in Stata is available at the following website:

www.econ.ucdavis.edu/faculty/dlmiller/stataﬁles/index.htm

4. More generally this situation arises when there is clustering at both a cross-section

level and temporal level. For example, ﬁnance applications may call for clustering

at the ﬁrm level and at the time (e.g., day) lev el. Petersen (2006) compares a

number of approaches for OLS estimation in this panel setting.

5. Even in a cross-section study clustering may arise at several levels simultaneously.

For example a model may have geographic-level regressors, industry-level regres-

sors and occupation-level regressors.

6. Clustering may arise due to discrete regressors. Moulton (1986) considered infer-

ence in this case, using an error components model. More recently, Card and Lee

(2004) argue that in a regression discontinuity framework where the treatment-

determining variable is discrete, the observations should be clustered at the lev el

of the right-hand side variable. If additionally in terest lies in a “primary” dimen-

sion of clustering (e.g., state or village), then there is clustering in more than one

dimension.

Our method builds on that for one-way cluster-robust inference. Initial con trols for

one-way clustering relied on strong assumptions on the dgp for the error term, such

as a one-way random eﬀects error model. This has been superseded by computation

of “cluster-robust” standard errors that rely on m uch weaker assumptions — errors are

independent but not identically distributed across clusters and can have quite general

patterns of within cluster correlation and heteroskedasticity. These standard errors

generalize those of White (1980) for independent heteroskedastic errors. Key references

include White (1984) for a multivariate dependen t variable, Liang and Zeger (1986)

for estimation in a generalized estimating equations setting, and Arellano (1987) and

Hansen (2005) for the ﬁxed eﬀects estimator in linear panel models. Wooldridge (2003)

provides a survey, and Wooldridge (2002) and Cameron and Trivedi (2005) give textbook

treatments.

For two-way or multi-way clustering that is nested, one simply clusters at the highest

level of aggregation. For example, with individual-level data and clustering on both

household and state one should cluster on state. Pepper (2002) provides an example.

If multi-way clustering is non-nested, the existing approach is to specify a multi-

way error components model with iid errors. Moulton (1986) considered clustering due

WethankMitchellPetersenforsendingusacopyofhispaper. Oneofthemethodsheusesisthat

prop osed in this pap e r for OLS with two-way clustering. Petersen cites as his source for this m ethod a

pap er by Thompson (2005) that we were unaware of until after working out our theoretical results and

doing substantial Monte Carlo work. Sometime after we describ ed our work to Petersen, he informed us

that Thompson (2006) had been posted on the internet. Thompson (2006) correctly derives the formula

for O LS in the two-way case, but the theoretical discussion does not address the general multi-way case

and nonlinear estimators that we also consider. Thompson’s Monte Carlo results are basically consistent

with ours, though they are somewhat narrower in scop e .

to grouping of three regressors (schooling, age and weeks worked) in a cross-section

log earnings regression. Davis (2002) modelled ﬁlm attendance data clustered by ﬁlm,

theater and time and provided a quite general way to implemen t feasible GLS even

with clustering in many dimensions. But these models impose strong assumptions,

including homoskedasticity and errors equicorrelated within cluster. And even the two-

way random eﬀects model for linear regression is typically not included in standard

econometrics packages.

In this paper we take a less parametric cluster-robust approach that generalizes

one-way cluster-robust standard errors to the non-nested multi-way clustering case.

Our new estimator is easy to implement. In the two-way clustering case, we obtain

three diﬀerent cluster-robust “variance” matrices for the estimator by one-way clustering

in, respectively, the ﬁrst dimension, the second dimension, and by the intersection of the

ﬁrst and second dimensions (sometimes referred to as ﬁrst-by-second, as in “state-by-

year”, clustering). Then w e add the ﬁrst two variance matrices and subtract the third.

In the three-way clustering case there is an analogous formula, with seven one-way

cluster robust variance matrices computed and com bined.

The methods and supporting theory for two-way and multi-way clustering and for

both OLS and quite general nonlinear estimators are presented in Section 2. Like the

one-way cluster-robust method, our methods assume that the number of clusters goes

to inﬁnity. This assumption does become more binding with multi-way clustering. For

example, in the two-way case it is assumed that min (G, H) →∞, where there are G

clusters in dimension 1 and H clusters in dimension 2. In Section 3 we present two

diﬀerent Monte Carlo experimen ts. The ﬁrst is based on a two-way random eﬀects

model and some extensions of that model. The second follows the general approach of

Bertrand et al. ( 2004) in investigating a placebo law in an earnings regression, except

that in our example the induced error dependence is two-way (over both states and

years) rather than one-way. Section 4 presents two empirical examples, Hersch (1998)

using OLS and Gruber and Madrian (1995) using both probit and OLS, where we

contrast results obtained using conventional one-way clustering to those allowing for

t wo-w ay clustering.

Section 5 concludes.

2. Cluster-Robust Inference

This section emphasizes the OLS estimator, for simplicity. We begin with a review of

one-way clustering, before considering in turn two-way clustering and multi-way clus-

tering. The section concludes with extension from OLS to m-estimators, suc h as probit

and logit, and GMM estimators.

We thank M arianne Bertrand, E sther Duﬂo, Sendhil M ullainathan, and Joni H ersch for assisting

us in re plicating their data sets.

Robust Inference with Multi-way Clustering

Figures

Citations

A Practitioner’s Guide to Cluster-Robust Inference

Bootstrap-Based Improvements for Inference with Clustered Errors

Why Do U.S. Firms Hold So Much More Cash than They Used To

Credit Spreads and Business Cycle Fluctuations

On making causal claims: A review and recommendations

References

Econometric Analysis of Cross Section and Panel Data

A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

Longitudinal data analysis using generalized linear models

How Much Should We Trust Differences-In-Differences Estimates?

Microeconometrics: Methods and Applications

Related Papers (5)

Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches

How Much Should We Trust Differences-In-Differences Estimates?

Econometric Analysis of Cross Section and Panel Data

A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix

Frequently Asked Questions (11)

Q1. What are the contributions in this paper?

Q2. What are some examples of nonlinear estimators to which this method can be applied?

Q3. What is the Wald test for iid errors?

Q4. What is the way to reduce the rejection rate of the random effects model?

Q5. What is the reason for the rejection rate of the Wald test statistic?

Q6. What is the 95% confidence interval for the Wald test?

Q7. What is the effect of multi-way clustering?

Q8. What is the problem with the two-way robust estimatoris?

Q9. What is the general approach of Bertrand et al. (2004) in investigating a?

Q10. How much is the maximum possible increase in standard errors due to error correlation at the household level?

Q11. What is the simplest way to calculate the cluster-robust standard errors?