What is the primary contribution of this paper?

The primary contribution of this paper is to use bootstrap procedures to obtain more accurate cluster-robust inference when there are few clusters.

What is the variation that the authors use to impose the null hypothesis?

The variation the authors use is one that uses equal weights and probability, and uses residuals from OLS estimation that imposes the null hypothesis.

What is the standard method for resampling that preserves the within-cluster?

The standard method for resampling that preserves the within-cluster features of the error is a pairscluster bootstrap that resamples at the cluster level, so that if the gth cluster is selected then all data (dependent and regressor variables) in that cluster appear in the resample.

What is the method for resampling?

The data are clustered into G independent groups, so the resampling method should be one that assumes independence across clusters but preserves correlation within clusters.

What is the obvious method for a regression model with additive error?

The obvious method is a residual cluster bootstrap that resamples with re-placement from the original sample residual vectors to give residuals {bu∗1, ..., bu∗G} and hence pseudo-sample {(by∗1,X1), ..., (by∗G,Xg)} where by∗g = X0gbβ + bu∗g.

What is the practical limitation of cluster-robust standard errors?

A practical limitation of inference with cluster-robust standard errors is that the asymptotic justification assumes that the number of clusters goes to infinity.

What is the significance of the BDM (2004) study?

One important conclusion of BDM (2004) is that for few (six) clusters the cluster-robust estimator performs poorly, and for moderate (ten and twenty) number of clusters theirbootstrap based method also does poorly.

What is the way to resample the residuals?

In particular, one can hold regressors X constant throughout the pseudo-samples, while resampling the residuals which can be then used to construct new values of the dependent variable y.

How does the BCA method perform for low numbers of clusters?

They find that (1) default standard errors do poorly; (2) cluster-robust standard errors do well for all but G = 6; and (3) their bootstrap, which the authors discuss in their section 3.1, does poorly for low numbers of clusters, with actual rejection rates 0.44, 0.23 and 0.13 for G = 6, 10 and 20, respectively.

What are the different methods used in the BCA bootstrap resampling?

The authors use three different cluster bootstrap resampling methods, respectively, the pairs cluster bootstrap, the residual clusters bootstrap with H0 imposed, and the wild bootstrap with H0 imposed.

What are the other methods that yield the lowest rejection rates?

The remaining bootstrap-t methods all yield rejection rates less than 0.08, with the residual cluster bootstrap-t and wild cluster bootstrap-t doing best.

What is the p-value for the pairs cluster bootstrap-t?

If the authors instead bootstrap this Wald statistic with B = 999 replications, the pairs cluster bootstrap-t yields p = 0.209, the residual cluster bootstrap-tgives p = 0.112, and the wild cluster bootstrap-t gives a p-value of 0.070.12

What is the p-value for the pairs cluster bootstrap?

The authors believe that the p-value for the pairs cluster bootstrap is implausibly large, for reasons discussed in the BDM replication, while the other two bootstraps lead to plausible p-values that, as expected, are larger than those obtained by using asymptotic normal critical values.

What is the alternative method with asymptotic refinement?

An alternative method with asymptotic refinement is the bias-corrected accelerated (BCA) procedure, defined in Efron (1987), Hall (1992, pp. 128- 141), and in (Cameron, Gelbach, and Miller, 2006).

What is the procedure that bootstraps w?

The bootstrap-t procedure directly bootstraps w which is asymptotically pivotal since the standard normal has no unknown parameters.

What are the other options for a bootstrap?

in contrast to the bootstrap-t procedure, it does not offer asymptotic refinement, and so may perform worse 6Alternative names used in the literature include cluster bootstrap, case bootstrap, nonparametric bootstrap, and nonoverlapping block bootstrap.

(Open Access) Bootstrap-Based Improvements for Inference with Clustered Errors (2008) | A. Colin Cameron

Q: What are the contributions in this paper?

The views expressed herein are those of the author ( s ) and do not necessarily reflect the views of the National Bureau of Economic Research.

Q: What is the common correction for clustering?

A common correction is to compute cluster-robust standard errors that generalize the White (1980) heteroskedastic-consistent estimate of OLS standard errors to the clustered setting.

NBER TECHNICAL WORKING PAPER SERIES

BOOTSTRAP-BASED IMPROVEMENTS FOR INFERENCE WITH CLUSTERED

ERRORS

A. Colin Cameron

Jonah B. Gelbach

Douglas L. Miller

Technical Working Paper 344

http://www.nber.org/papers/t0344

NATIONAL BUREAU OF ECONOMIC RESEARCH

1050 Massachusetts Avenue

Cambridge, MA 02138

September 2007

We thank an anonymous referee and participants at The Australian National University, UC Berkeley,

UC Riverside, Dartmouth College, Florida State University, Indiana University, and MIT for useful

comments. Miller acknowledges funding from the National Institute on Aging, through Grant Number

T32-AG00186 to the NBER. The views expressed herein are those of the author(s) and do not necessarily

reflect the views of the National Bureau of Economic Research.

sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided

that full credit, including © notice, is given to the source.

Bootstrap-Based Improvements for Inference with Clustered Errors

A. Colin Cameron, Jonah B. Gelbach, and Douglas L. Miller

NBER Technical Working Paper No. 344

September 2007

JEL No. C12,C15,C21

ABSTRACT

Researchers have increasingly realized the need to account for within-group dependence in estimating

standard errors of regression parameter estimates. The usual solution is to calculate cluster-robust

standard errors that permit heteroskedasticity and within-cluster error correlation, but presume that

the number of clusters is large. Standard asymptotic tests can over-reject, however, with few (5-30)

clusters. We investigate inference using cluster bootstrap-t procedures that provide asymptotic refinement.

These procedures are evaluated using Monte Carlos, including the example of Bertrand, Duflo and

Mullainathan (2004). Rejection rates of ten percent using standard methods can be reduced to the nominal

size of five percent using our methods.

A. Colin Cameron

Department of Economics

UC Davis

One Shields Avenue

Davis, CA 95616

accameron@ucdavis.edu

Jonah B. Gelbach

Department of Economics

University of Arizona

1130 E. Helen Street

Tucson, AZ 85721-0108

gelbach@email.arizona.edu

Douglas L. Miller

UC, Davis

Department of Economics

One Shields Avenue

davis, CA 95616-8578

and NBER

dlmiller@ucdavis.edu

1 Introduction

Microeconometrics researchers have increasingly realized the essential need

to account for an y within-grou p dependence in estimating standard errors of

regression param eter estimates. In many settings the default OLS standard

errors that ignore such clustering can greatly underestim ate the true OLS

standard errors, as empha sized by Moulton (1986, 1990).

A comm on correction is to compute cluster-robust standard errors that

generalize the W h ite (198 0) h eteroskedastic-consistent estimate of OL S stan -

dard errors to the clustered setting. This perm its both error heteroskeda s-

ticity and quite ﬂexible error correlation within cluster, unlik e a muc h more

restrictiv e random eﬀects or error components model. In econometrics this

adju stm ent w as proposed by W hite (1984 ) and Arellano (1987), and it is im-

plemented in STATA, for example, using the cluster option. In the statistics

literature these are called sandw ich standard errors, proposed by Liang and

Zeger (1986) for generalized estimating equations, and they are implemented

in SAS, for example, within the GE NMOD procedure. A recent brief survey

is giv en in Wooldridge (2003).

Not all empirical studies use appropriate corrections for clustering. In

particular, for ﬁxed eﬀects panel models the errors are usually correlated

ev en after con trol for ﬁxed eﬀects, ye t many studies either pro vide no control

for serial correlation or erroneously cluster at too ﬁne a lev el. Kézdi (2004)

demon strated the usefulness of cluster robust standard errors in this setting

and con trasted these with other standard errors based on stronger distribu-

tional assumptions. Bertrand, Duﬂo, and Mullainathan (2004), henceforth

BDM (2004), focused on implications for diﬀerence-in-diﬀerence (DID ) stud-

ies using variation across states and yea rs. Then the regressor of in t erest is

an indicator variable that is highly correlated within cluster (state) so there

is great need to correct standard errors for clustering. The clustering should

be on state, rather than on state-y ear.

A practical lim itation of inference with cluster-rob ust standard errors is

that the asympto tic justiﬁcation assumes that the nu mber of clusters goes

to inﬁnity. Yet in some applications there may be few clusters. For example,

this happens if clustering is on region and there are few regions. W ith a

small nu mber of clusters the cluster-robust standard errors are do w nw a rds

biased. Bias corrections have been proposed in the statistics literature; see

Kauerm an n and Carroll (2001), M ancl and DeRouen (2001), and Bell and

McCaﬀrey (2002). Angrist and Lavy (2002) in an applied study ﬁnd that bias

adjustm e nt of cluster-robust standard errors can make quite a diﬀerence. But

ev en after appropriate bias correction, with few clusters the usual Wald sta-

tistics for h ypothesis testing with asymptotic standard normal or chi-square

critical values over-reject. BDM (2004) demonstrate through a Monte Carlo

experim ent that the Wald test based on (unadjusted) cluster-robu st standa rd

errors o ver-rejects if standard norm al critical values are used. Donald and

Lang (2007) also demonstrate this and propose, for DID studies with policy

in variant within state, an alternative two-step GLS estimator that leads to T-

distributed Wald tests in some special circumstances. Ibra gim ov and Muller

(2007) propose an alternate approac h based on separate estimation within

eac h group. They separate the data in to independent groups, estimate the

model within each group, average the separate estimates and divide by the

samp le standard deviation of these estimates, and then compa re against crit-

ical values from a T-distr ibut ion. T h is approach holds prom ise for settings

with few groups and where model identiﬁcation and a central limit theorem

holds within each group. Our proposed method does not require the latter

t wo conditions, can be used to test m ultiple hypotheses, and is based on the

param eter estimator commo nly used in practice.

In this paper we in vestigate whether bootstrapp ing to obtain asymp-

totic reﬁnement leads to improv ed inference for OLS estimation with cluster-

Bootstrap-Based Improvements for Inference with Clustered Errors

Figures

Citations

A Practitioner’s Guide to Cluster-Robust Inference

Robust Inference with Multi-way Clustering

Mostly harmless econometrics

Teacher training, teacher quality and student achievement

Does Management Matter? Evidence from India

References

An introduction to the bootstrap

A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

Longitudinal data analysis using generalized linear models

Bootstrap Methods: Another Look at the Jackknife

How Much Should We Trust Differences-In-Differences Estimates?

Related Papers (5)

How Much Should We Trust Differences-In-Differences Estimates?

A Practitioner’s Guide to Cluster-Robust Inference

Mostly Harmless Econometrics: An Empiricist's Companion

Econometric Analysis of Cross Section and Panel Data

An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units

Frequently Asked Questions (19)

Q1. What are the contributions in this paper?

Q2. What is the primary contribution of this paper?

Q3. What is the variation that the authors use to impose the null hypothesis?

Q4. What is the standard method for resampling that preserves the within-cluster?

Q5. What is the method for resampling?

Q6. What is the obvious method for a regression model with additive error?

Q7. What is the practical limitation of cluster-robust standard errors?

Q8. What is the significance of the BDM (2004) study?

Q9. What is the way to resample the residuals?

Q10. What is the common correction for clustering?

Q11. How does the BCA method perform for low numbers of clusters?

Q12. What are the different methods used in the BCA bootstrap resampling?

Q13. What are the other methods that yield the lowest rejection rates?

Q14. What is the p-value for the pairs cluster bootstrap-t?

Q15. What is the p-value for the pairs cluster bootstrap?

Q16. What does the paper propose for DID studies with policy invariant within state?

Q17. What is the alternative method with asymptotic refinement?

Q18. What is the procedure that bootstraps w?

Q19. What are the other options for a bootstrap?

Trending Questions (1)