scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Practitioner’s Guide to Cluster-Robust Inference

31 Mar 2015-Journal of Human Resources (University of Wisconsin Press)-Vol. 50, Iss: 2, pp 317-372
TL;DR: This work considers statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters, when the number of clusters is large and default standard errors can greatly overstate estimator precision.
Abstract: We consider statistical inference for regression when data are grouped into clus- ters, with regression model errors independent across clusters but correlated within clusters. Examples include data on individuals with clustering on village or region or other category such as industry, and state-year dierences-in-dierences studies with clustering on state. In such settings default standard errors can greatly overstate es- timator precision. Instead, if the number of clusters is large, statistical inference after OLS should be based on cluster-robust standard errors. We outline the basic method as well as many complications that can arise in practice. These include cluster-specic �xed eects, few clusters, multi-way clustering, and estimators other than OLS.

Content maybe subject to copyright    Report

1
A Practitioner’s Guide to Cluster-Robust Inference
A. Colin Cameron and Douglas L. Miller
Abstract
We consider statistical inference for regression when data are grouped into clusters, with
regression model errors independent across clusters but correlated within clusters. Examples
include data on individuals with clustering on village or region or other category such as
industry, and state-year differences-in-differences studies with clustering on state. In such
settings default standard errors can greatly overstate estimator precision. Instead, if the number
of clusters is large, statistical inference after OLS should be based on cluster-robust standard
errors. We outline the basic method as well as many complications that can arise in practice.
These include cluster-specific fixed effects, few clusters, multi-way clustering, and estimators
other than OLS.
Colin Cameron is a Professor in the Department of Economics at UC- Davis. Doug Miller is
an Associate Professor in the Department of Economics at UC- Davis. They thank four
referees and the journal editor for very helpful comments and for guidance, participants at the
2013 California Econometrics Conference, a workshop sponsored by the U.K. Programme
Evaluation for Policy Analysis, seminars at University of Southern California and at
University of Uppsala, and the many people who over time have sent them cluster-related
puzzles (the solutions to some of which appear in this paper). Doug Miller acknowledges
financial support from the Center for Health and Wellbeing at the Woodrow Wilson School of
Public Policy at Princeton University.

2
I. Introduction
In an empiricist’s day-to-day practice, most effort is spent on getting unbiased or
consistent point estimates. That is, a lot of attention focuses on the parameters (
󰆹
). In this
paper we focus on getting accurate statistical inference, a fundamental component of which is
obtaining accurate standard errors (, the estimated standard deviation of
󰆹
). We begin with
the basic reminder that empirical researchers should also really care about getting this part
right. An asymptotic 95% confidence interval is
󰆹
± 1.96 × , and hypothesis testing is
typically based on the Wald “t-statistic = (
󰆹
)/. Both
󰆹
and  are critical
ingredients for statistical inference, and we should be paying as much attention to getting a
good  as we do to obtain
󰆹
.
In this paper, we consider statistical inference in regression models where observations
can be grouped into clusters, with model errors uncorrelated across clusters but correlated
within cluster. One leading example of “clustered errors” is individual-level cross-section data
with clustering on geographical region, such as village or state. Then model errors for
individuals in the same region may be correlated, while model errors for individuals in
different regions are assumed to be uncorrelated. A second leading example is panel data. Then
model errors in different time periods for a given individual (e.g., person or firm or region) may
be correlated, while model errors for different individuals are assumed to be uncorrelated.
Failure to control for within-cluster error correlation can lead to very misleadingly
small standard errors, and consequent misleadingly narrow confidence intervals, large
t-statistics and low p-values. It is not unusual to have applications where standard errors that
control for within-cluster correlation are several times larger than default standard errors that
ignore such correlation. As shown below, the need for such control increases not only with
increase in the size of within-cluster error correlation, but the need also increases with the size
of within-cluster correlation of regressors and with the number of observations within a cluster.
A leading example, highlighted by Moulton (1986, 1990), is when interest lies in measuring the
effect of a policy variable, or other aggregated regressor, that takes the same value for all
observations within a cluster.
One way to control for clustered errors in a linear regression model is to additionally
specify a model for the within-cluster error correlation, consistently estimate the parameters of
this error correlation model, and then estimate the original model by feasible generalized least
squares (FGLS) rather than ordinary least squares (OLS). Examples include random effects
estimators and, more generally, random coefficient and hierarchical models. If all goes well
this provides valid statistical inference, as well as estimates of the parameters of the original
regression model that are more efficient than OLS. However, these desirable properties hold
only under the very strong assumption that the model for within-cluster error correlation is
correctly specified.
A more recent method to control for clustered errors is to estimate the regression model
with limited or no control for within-cluster error correlation, and then post-estimation obtain
“cluster-robust” standard errors proposed by White (1984, p.134-142) for OLS with a
multivariate dependent variable (directly applicable to balanced clusters); by Liang and Zeger
(1986) for linear and nonlinear models; and by Arellano (1987) for the fixed effects estimator
in linear panel models. These cluster-robust standard errors do not require specification of a
model for within-cluster error correlation, but do require the additional assumption that the
number of clusters, rather than just the number of observations, goes to infinity.
Cluster-robust standard errors are now widely used, popularized in part by Rogers
(1993) who incorporated the method in Stata, and by Bertrand, Duflo and Mullainathan (2004)

3
who pointed out that many differences-in-differences studies failed to control for clustered
errors, and those that did often clustered at the wrong level. Cameron and Miller (2011) and
Wooldridge (2003, 2006) provide surveys, and lengthy expositions are given in Angrist and
Pischke (2009) and Wooldridge (2010).
One goal of this paper is to provide the practitioner with the methods to implement
cluster-robust inference. To this end we include in the paper reference to relevant Stata
commands (for version 13), since Stata is the computer package most often used in applied
microeconometrics research. And we will post on our websites more expansive Stata code and
the datasets used in this paper. A second goal is presenting how to deal with complications such
as determining when there is a need to cluster, incorporating fixed effects, and inference when
there are few clusters. A third goal is to provide an exposition of the underlying econometric
theory as this can aid in understanding complications. In practice the most difficult
complication to deal with can be “few” clusters, see Section VI. There is no clear-cut definition
of “few”; depending on the situation “few” may range from less than 20 to less than 50 clusters
in the balanced case.
We focus on OLS, for simplicity and because this is the most commonly-used
estimation method in practice. Section II presents the basic results for OLS with clustered
errors. In principle, implementation is straightforward as econometrics packages include
cluster-robust as an option for the commonly-used estimators; in Stata it is the
vce(cluster) option. The remainder of the survey concentrates on complications that
often arise in practice. Section III addresses how the addition of fixed effects impacts
cluster-robust inference. Section IV deals with the obvious complication that it is not always
clear what to cluster over. Section V considers clustering when there is more than one way to
do so and these ways are not nested in each other. Section VI considers how to adjust inference
when there are just a few clusters as, without adjustment, test statistics based on the
cluster-robust standard errors over-reject and confidence intervals are too narrow. Section VII
presents extension to the full range of estimators instrumental variables, nonlinear models
such as logit and probit, and generalized method of moments. Section VIII presents both
empirical examples and real-data based simulations. Concluding thoughts are given in Section
IX.
II. Cluster-Robust Inference
In this section we present the fundamentals of cluster-robust inference. For these basic
results we assume that the model does not include cluster-specific fixed effects, that it is clear
how to form the clusters, and that there are many clusters. We relax these conditions in
subsequent sections.
Clustered errors have two main consequences: they (usually) reduce the precision of
󰆹
,
and the standard estimator for the variance of
󰆹
, V
[
󰆹
], is (usually) biased downward from the
true variance. Computing cluster-robust standard errors is a fix for the latter issue. We illustrate
these issues, initially in the context of a very simple model and then in the following subsection
in a more typical model.
A. A Simple Example
For simplicity, we begin with OLS with a single regressor that is nonstochastic, and
assume no intercept in the model. The results extend to multiple regression with stochastic
regressors.
Let
=
+
, = 1, . . . , , where
is nonstochastic and E[
] = 0. The OLS
estimator
󰆹
=
/
can be re-expressed as
󰆹
=
/
, so in general

4
V[
󰆹
] = 󰨙E[(
󰆹
)
] = V 
/ 
.
(1)
If errors are uncorrelated over , then V
[
]
=
V[
]
=
V[
]. In the
simplest case of homoskedastic errors, V[
] =
and (1) simplifies to V[
󰆹
] =
/
.
If instead errors are heteroskedastic, then (1) becomes
V

[
󰆹
] = 
E[
]
/
,
using V[
] = E[
] since E[
] = 0. Implementation seemingly requires consistent
estimates of each of the error variances E[
]. In a very influential paper, one that extends
naturally to the clustered setting, White (1980) noted that instead all that is needed is an
estimate of the scalar
E[
]
, and that one can simply use
, where
=
󰆹
is the OLS residual, provided . This leads to estimated variance
V

[
󰆹
] = 
]/ 
.
The resulting standard error for
󰆹
is often called a robust standard error, though a better, more
precise term, is heteroskedastic-robust standard error.
What if errors are correlated over ? In the most general case where all errors are
correlated with each other,
V 
=
Cov[
,
]
=
E[
]
,
so
V

[
󰆹
] = 󰇧
E[
]
󰇨/ 
.
The obvious extension of White (1980) is to use V
[
󰆹
] =
]/
(
)
, but this
equals zero since
= 0. Instead one needs to first set a large fraction of the error
correlations E[
] to zero. For time series data with errors assumed to be correlated only up
to, say, periods apart as well as heteroskedastic, White’s result can be extended to yield a
heteroskedastic- and autocorrelation-consistent (HAC) variance estimate; see Newey and West
(1987).
In this paper we consider clustered errors, with E[
] = 0 unless observations and
are in the same cluster (such as same region). Then
V


󰆹
= 󰇧
E

[
, 󰨙in󰨙same󰨙cluster
]
󰇨
/ 
,
(2)
where the indicator function [] equals 1 if event happens and equals 0 if event does
not happen. Provided the number of clusters goes to infinity, we can use the variance estimate

5
V

[
󰆹
] = 󰇧
[, 󰨙in󰨙same󰨙cluster]
󰇨
/ 
.
(3)
This estimate is called a cluster-robust estimate, though more precisely it is heteroskedastic-
and cluster-robust. This estimate reduces to V

[
󰆹
] in the special case that there is only one
observation in each cluster.
Typically V

[
󰆹
] exceeds V

[
󰆹
] due to the addition of terms when . The
amount of increase is larger (1) the more positively associated are the regressors across
observations in the same cluster (via
in (3)), (2) the more correlated are the errors (via
E[
] in (2)), and (3) the more observations are in the same cluster (via [, in same
cluster] in (3)).
There are several take-away messages. First there can be great loss of efficiency in OLS
estimation if errors are correlated within cluster rather than completely uncorrelated.
Intuitively, if errors are positively correlated within cluster then an additional observation in
the cluster no longer provides a completely independent piece of new information. Second,
failure to control for this within-cluster error correlation can lead to using standard errors that
are too small, with consequent overly-narrow confidence intervals, overly-large t-statistics,
and over-rejection of true null hypotheses. Third, it is straightforward to obtain cluster-robust
standard errors, though they do rely on the assumption that the number of clusters goes to
infinity (see Section VI for the few clusters case).
B. Clustered Errors and Two Leading Examples
Let denote the

of individuals in the sample, and denote the

of
clusters. Then for individual in cluster the linear model with (one-way) clustering is

=

󰆒
+

,
(4)
where

is a × 1 vector. As usual it is assumed that E

|

= 0. The key assumption
is that errors are uncorrelated across clusters, while errors for individuals belonging to the same
cluster may be correlated. Thus
E[

󰆓
|

,
󰆓
] = 0, 󰨙unless󰨙=
󰆒
.
(5)
1. Example 1: Individuals in Cluster
Hersch (1998) uses cross-section individual-level data to estimate the impact of job
injury risk on wages. Since there is no individual-level data on job injury rate, a more
aggregated measure such as job injury risk in the individual’s industry is used as a regressor.
Then for individual (with = 5960) in industry (with = 211)

= ×
+

󰆒
+

.
The regressor
is perfectly correlated within industry. The error term will be
positively correlated within industry if the model systematically overpredicts (or
underpredicts) wages in a given industry. In this case default OLS standard errors will be

Citations
More filters
Journal ArticleDOI
09 Mar 2018-Science
TL;DR: A large-scale analysis of tweets reveals that false rumors spread further and faster than the truth, and false news was more novel than true news, which suggests that people were more likely to share novel information.
Abstract: We investigated the differential diffusion of all of the verified true and false news stories distributed on Twitter from 2006 to 2017. The data comprise ~126,000 stories tweeted by ~3 million people more than 4.5 million times. We classified news as true or false using information from six independent fact-checking organizations that exhibited 95 to 98% agreement on the classifications. Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information, and the effects were more pronounced for false political news than for false news about terrorism, natural disasters, science, urban legends, or financial information. We found that false news was more novel than true news, which suggests that people were more likely to share novel information. Whereas false stories inspired fear, disgust, and surprise in replies, true stories inspired anticipation, sadness, joy, and trust. Contrary to conventional wisdom, robots accelerated the spread of true and false news at the same rate, implying that false news spreads more than the truth because humans, not robots, are more likely to spread it.

4,241 citations


Cites background from "A Practitioner’s Guide to Cluster-R..."

  • ...Note that all standard error (SE) values correspond to cluster-robust standard errors [58, 59], clustered on rumors (i....

    [...]

  • ...To learn more about cluster-robust inference please refer to Cameron and Miller’s [59] excellent article on this subject....

    [...]

Journal ArticleDOI
James J. Lee1, Robbee Wedow2, Aysu Okbay3, Edward Kong4, Omeed Maghzian4, Meghan Zacher4, Tuan Anh Nguyen-Viet5, Peter Bowers4, Julia Sidorenko6, Julia Sidorenko7, Richard Karlsson Linnér8, Richard Karlsson Linnér3, Mark Alan Fontana9, Mark Alan Fontana5, Tushar Kundu5, Chanwook Lee4, Hui Li4, Ruoxi Li5, Rebecca Royer5, Pascal Timshel10, Pascal Timshel11, Raymond K. Walters12, Raymond K. Walters4, Emily A. Willoughby1, Loic Yengo6, Maris Alver7, Yanchun Bao13, David W. Clark14, Felix R. Day15, Nicholas A. Furlotte, Peter K. Joshi16, Peter K. Joshi14, Kathryn E. Kemper6, Aaron Kleinman, Claudia Langenberg15, Reedik Mägi7, Joey W. Trampush5, Shefali S. Verma17, Yang Wu6, Max Lam, Jing Hua Zhao15, Zhili Zheng6, Zhili Zheng18, Jason D. Boardman2, Harry Campbell14, Jeremy Freese19, Kathleen Mullan Harris20, Caroline Hayward14, Pamela Herd13, Pamela Herd21, Meena Kumari13, Todd Lencz22, Todd Lencz23, Jian'an Luan15, Anil K. Malhotra22, Anil K. Malhotra23, Andres Metspalu7, Lili Milani7, Ken K. Ong15, John R. B. Perry15, David J. Porteous14, Marylyn D. Ritchie17, Melissa C. Smart14, Blair H. Smith24, Joyce Y. Tung, Nicholas J. Wareham15, James F. Wilson14, Jonathan P. Beauchamp25, Dalton Conley26, Tõnu Esko7, Steven F. Lehrer27, Steven F. Lehrer28, Steven F. Lehrer29, Patrik K. E. Magnusson30, Sven Oskarsson31, Tune H. Pers11, Tune H. Pers10, Matthew R. Robinson6, Matthew R. Robinson32, Kevin Thom33, Chelsea Watson5, Christopher F. Chabris17, Michelle N. Meyer17, David Laibson4, Jian Yang6, Magnus Johannesson34, Philipp Koellinger8, Philipp Koellinger3, Patrick Turley4, Patrick Turley12, Peter M. Visscher6, Daniel J. Benjamin29, Daniel J. Benjamin5, David Cesarini29, David Cesarini33 
TL;DR: A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance ineducational attainment and 7–10% ofthe variance in cognitive performance, which substantially increases the utility ofpolygenic scores as tools in research.
Abstract: Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11-13% of the variance in educational attainment and 7-10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

1,658 citations

Book
01 Jan 2003

911 citations

Journal ArticleDOI
TL;DR: Key features of DID designs are reviewed with an emphasis on public health policy research and it is noted that combining elements from multiple quasi-experimental techniques may be important in the next wave of innovations to the DID approach.
Abstract: The difference in difference (DID) design is a quasi-experimental research design that researchers often use to study causal relationships in public health settings where randomized controlled trials (RCTs) are infeasible or unethical. However, causal inference poses many challenges in DID designs. In this article, we review key features of DID designs with an emphasis on public health policy research. Contemporary researchers should take an active approach to the design of DID studies, seeking to construct comparison groups, sensitivity analyses, and robustness checks that help validate the method's assumptions. We explain the key assumptions of the design and discuss analytic tactics, supplementary analysis, and approaches to statistical inference that are often important in applied research. The DID design is not a perfect substitute for randomized experiments, but it often represents a feasible way to learn about casual relationships. We conclude by noting that combining elements from multiple quasi-experimental techniques may be important in the next wave of innovations to the DID approach.

789 citations

Journal ArticleDOI
TL;DR: This work points out problems with the assessment of country effects that appear not to be widely appreciated, and develops arguments using Monte Carlo simulation analysis of multilevel linear and logit models.
Abstract: Country effects on outcomes for individuals are often analysed using multilevel (hierarchical) models applied to harmonized multi-country data sets such as ESS, EU-SILC, EVS, ISSP, and SHARE. We point out problems with the assessment of country effects that appear not to be widely appreciated, and develop our arguments using Monte Carlo simulation analysis of multilevel linear and logit models. With large sample sizes of individuals within each country but only a small number of countries, analysts can reliably estimate individual-level effects but estimates of parameters summarizing country effects are likely to be unreliable. Multilevel modelling methods are no panacea.

594 citations


Cites methods from "A Practitioner’s Guide to Cluster-R..."

  • ...If the normality assumption cannot be justified, special bootstrapping methods may provide acceptable inference (Carpenter, Goldstein and Rasbash 2003; Cameron, Gelbach and Miller, 2008; Cameron and Miller 2013)....

    [...]

  • ...Some simple adjustment methods for linear models to the same end are discussed by Cameron and Miller (2013)....

    [...]

References
More filters
Book
01 Jan 2001
TL;DR: This is the essential companion to Jeffrey Wooldridge's widely-used graduate text Econometric Analysis of Cross Section and Panel Data (MIT Press, 2001).
Abstract: The second edition of this acclaimed graduate text provides a unified treatment of two methods used in contemporary econometric research, cross section and data panel methods. By focusing on assumptions that can be given behavioral content, the book maintains an appropriate level of rigor while emphasizing intuitive thinking. The analysis covers both linear and nonlinear models, including models with dynamics and/or individual heterogeneity. In addition to general estimation frameworks (particular methods of moments and maximum likelihood), specific linear and nonlinear methods are covered in detail, including probit and logit models and their multivariate, Tobit models, models for count data, censored and missing data schemes, causal (or treatment) effects, and duration analysis. Econometric Analysis of Cross Section and Panel Data was the first graduate econometrics text to focus on microeconomic data structures, allowing assumptions to be separated into population and sampling assumptions. This second edition has been substantially updated and revised. Improvements include a broader class of models for missing data problems; more detailed treatment of cluster problems, an important topic for empirical researchers; expanded discussion of "generalized instrumental variables" (GIV) estimation; new coverage (based on the author's own recent research) of inverse probability weighting; a more complete framework for estimating treatment effects with panel data, and a firmly established link between econometric approaches to nonlinear panel data and the "generalized estimating equation" literature popular in statistics and other fields. New attention is given to explaining when particular econometric methods can be applied; the goal is not only to tell readers what does work, but why certain "obvious" procedures do not. The numerous included exercises, both theoretical and computer-based, allow the reader to extend methods covered in the text and discover new insights.

28,298 citations

Journal ArticleDOI
TL;DR: In this article, a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic is presented, which does not depend on a formal model of the structure of the heteroSkewedness.
Abstract: This paper presents a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic. This estimator does not depend on a formal model of the structure of the heteroskedasticity. By comparing the elements of the new estimator to those of the usual covariance estimator, one obtains a direct test for heteroskedasticity, since in the absence of heteroskedasticity, the two estimators will be approximately equal, but will generally diverge otherwise. The test has an appealing least squares interpretation.

25,689 citations

ReportDOI
TL;DR: In this article, a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction is described.
Abstract: This paper describes a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction. It also establishes consistency of the estimated covariance matrix under fairly general conditions.

18,117 citations

Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations

Journal ArticleDOI
TL;DR: In this article, the authors randomly generate placebo laws in state-level data on female wages from the Current Population Survey and use OLS to compute the DD estimate of its "effect" as well as the standard error of this estimate.
Abstract: Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the autocorrelation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre”- and “post”-period and explicitly takes into account the effective sample size works well even for small numbers of states.

9,397 citations

Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "A practitioner’s guide to cluster-robust inference" ?

The authors consider statistical inference for regression when data are grouped into clusters, with regression model errors independent across clusters but correlated within clusters. They thank four referees and the journal editor for very helpful comments and for guidance, participants at the 2013 California Econometrics Conference, a workshop sponsored by the U. K. Programme Evaluation for Policy Analysis, seminars at University of Southern California and at University of Uppsala, and the many people who over time have sent them cluster-related puzzles ( the solutions to some of which appear in this paper ). The authors outline the basic method as well as many complications that can arise in practice. 

(18)The main reason that empirical economists use the cluster-specific FE estimator is that it controls for a limited form of endogeneity of regressors. 

The natural approach to introduce cluster-specific effects in a nonlinear model is to include a full set of cluster dummies as additional regressors. 

Asymptotic refinement can be achieved by bootstrapping a statistic that is asymptotically pivotal, meaning the asymptotic distribution does not depend on any unknown parameters. 

Webb (2013) proposes greatly reducing the discreteness of p-values with very low 𝐺 by instead using a six-point distribution for the weights 𝑑𝑔 in step 1b. 

Then the error 𝑢𝑖𝑡 may be correlated over time (i.e., within-cluster) due to omitted factors that evolve progressively over time. 

if clustering has a modest effect, so cluster-robust and default standard errors are similar in expectation, then cluster-robust may be smaller due to noise. 

The p-value for a symmetric test based on the original sample Wald statistic 𝑤 equals the proportion of times that |𝑤| > |𝑤𝑏∗|, 𝑏 = 1, . . . ,𝐵. 

The simplest approach is a pooled approach that assumes that clustering does not change the functional form for the conditional probability of a single observation. 

Gelbach and Miller (2011) present an eigendecomposition technique used in the time series HAC literature that zeroes out negative eigenvalues in V�2way[𝜷�] to produce a positive semi-definite variance matrix estimate. 

Comparing row 15 to row 12, imposing the null hypothesis in performing the wild43bootstrap does not change the rejection rate very much in this set of simulations when 𝐺 ≥ 10, although it appears to matter when 𝐺 = 6. 

Trending Questions (1)
In accounting better robust standard errors or clustered firm standard errors?

Cluster-robust standard errors are preferred in accounting when dealing with clustered firm data to address correlated errors within clusters and avoid overstating estimator precision.