Journal Article•DOI•

Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches

01 Jan 2009-Review of Financial Studies (Oxford University Press)-Vol. 22, Iss: 1, pp 435-480

TL;DR: In this article, the authors examine the different methods used in the literature and explain when the different approaches yield the same (and correct) standard errors and when they diverge, and give researchers guidance for their use.

read less

Abstract: In both corporate finance and asset pricing empirical work, researchers are often confronted with panel data. In these data sets, the residuals may be correlated across firms and across time, and OLS standard errors can be biased. Historically, the two literatures have used different solutions to this problem. Corporate finance has relied on clustered standard errors, while asset pricing has used the Fama-MacBeth procedure to estimate standard errors. This paper examines the different methods used in the literature and explains when the different methods yield the same (and correct) standard errors and when they diverge. The intent is to provide intuition as to why the different approaches sometimes give different answers and give researchers guidance for their use.

...read moreread less

Summary (4 min read)

Jump to: [Introduction] – [A)] – [Var [ βOLS] – [B)] – [C)] – [MacBeth coefficient estimates.] – [D)] – [E)] – [F)] – [III)] – [IV)] – [V) Estimating Standard Errors in the Presence of a Temporary Firm Effect] – [VI)] and [VII) Conclusions.]

Introduction

It is well known that OLS standard errors are unbiased when the residuals are independent and identically distributed.
Thirty-four percent of the papers estimated both the coefficients and the standard errors using the Fama-MacBeth procedure (Fama-MacBeth, 1973) .
There are two general forms of dependence which are most common in finance applications.
The residuals of a given firm may be correlated across years (time series dependence) for a given firm.
Of the most common approaches used in the literature and examined in this paper, only clustered standard errors are unbiased as they account for the residual dependence created by the firm effect.

A)

To provide intuition on why the standard errors produced by OLS are incorrect and how alternative estimation methods correct this problem, it is helpful to very briefly review the expression for the variance of the estimated coefficients.
This is the standard OLS formula and is based on the assumption that the errors are independent and identically distributed (Greene, 1990) .
Each observation of the dependent variable is a monthly equity return.
Since the adjustment in the standard error, and the bias in White standard errors, is a function of the monthly auto-correlation in the Xs (a large number) times the auto-correlation in the residuals (zero), the standard errors clustered by firm are equal to the White standard errors.
If the time effect influenced each firm in a given month by the same amount, the time dummies would absorb the effect and clustering by time would not change the reported standard errors.

Var [ βOLS

I use the assumption that residuals are independent across firms in deriving the second line.
To understand this intuition, consider the extreme case where the independent variables and residuals are perfectly correlated across time (i.e. ρ X =1 and ρ ε =1).
The basic program which I used to simulate the data and estimate the coefficients and standard errors is posted on my web site.
Estimated standard error will shrink accordingly and incorrectly.
The correlation of the residuals within cluster is the problem the clustered standard errors are designed to correct.

B)

Testing the Standard Error Estimates by Simulation.
The estimated standard errors are extremely close to the true standard errors and the number of statistically significant t-statistics is close to three percent across the simulations (using a 1 percent critical value).
Once the firm effect is temporary, the OLS standard errors again underestimate the true standard errors even when firm dummies are included in the regression (Wooldridge, 2003, Baker, Stein, and Wurgler, 2003) .
In the asset pricing example, these standard errors were identical to the standard errors clustered by time, since there was no firm effect (Table 6 ).
The results are similar for firm size, firm age, asset tangibility (the ratio of property, plant, and equipment to assets), and R&D expenditure.

C)

An alternative way to estimate the regression coefficients and standard errors when the residuals are not independent is the Fama-MacBeth approach (Fama and MacBeth, 1973) .
And the estimated variance of the Fama-MacBeth estimate is calculated as: This is rarely done in the finance literature.
The GLS estimates are more efficient than the OLS estimates (with or without firm dummies) when the residuals are correlated (compare Table 5 -Panels A and B).
If the firm effect is temporary, then the residuals are still correlated within cluster and this is the source of the bias in the standard errors.

MacBeth coefficient estimates.

This result is the same as their expression for the variance of the OLS coefficient (see equation 7).
The Fama-MacBeth standard error are biased in exactly the same way as the OLS estimates.
In both cases, the magnitude of the bias is a function of the serial correlation of both the independent variable and the residual within a cluster and the number of time periods per firm.

D)

Since the average first-order auto-correlation is negative, the adjusted Fama-MacBeth standard errors are even more biased than the unadjusted standard errors.
To verify that this is correct, I re-ran the simulation using 20 years of data per firm and the average estimated serial correlation moved closer to zero, rising from -0.1157 to -0.0556.

E)

Incorrect Standard Error Estimates in Published Papers.
As part of my literature survey, I looked for papers which ran a regression of one persistent firm characteristic on other persistent firm characteristics (i.e. the serial correlation of the variables is large and dies away slowly as the lag 11 Both of these papers correct the Fama-MacBeth standard errors for the first order auto-correlation of the estimated slopes.
Pastor and Veronesi (2003) report that this does not change their answer.
I will show in Section V-C that this correction still produces biased standard errors and this probably explains Pastor and Veronesi's finding that the adjustment has little effect on their estimated standard errors.
12 Baker and Wurgler (2002) estimate both White and Fama-MacBeth standard errors but do not report the Fama-MacBeth standard errors since they are the same as the White standard errors.

F)

An alternative approach for addressing the correlation of errors across observation is the Newey-West procedure (Newey and West, 1987) .
Thus having a lag length of less than the maximum (T-1), will cause the Newey-West standard errors to underestimate the true standard error when the firm effect is fixed.
When I drew observations as a cluster (e.g. I drew 500 firms with replacement and took all 10 years for any firm which was drawn), the estimated standard errors are the same as the clustered standard errors (e.g. 0.0505 for bootstrap versus 0.0508 for clustered).
Newey and West show that if M is allowed to grow at the correct rate with the sample size (T), then their estimate is consistent.

III)

To demonstrate how the techniques work in the presence of a time effect, I generated data sets which contain only a time effect (observations on different firms within the same year are correlated).
The expressions for the standard errors in the presence of only a time effect are correct once I exchange N and T. EQUATION A) Clustered Standard Error Estimates.
The problem arises due to the limited number of clusters (e.g. years).
To explore this issue, I simulated data sets of 5,000 observations with the number of years (or clusters) ranging from 5 to 100.
The bias in the clustered standard error estimates declines with the number of clusters, dropping from 27 percent when there are 5 years (or clusters) to 3 percent when there are 40 years to 1 percent when there are 100 years .

IV)

Estimating Standard Errors in the Presence of a Fixed Firm and Time Effect.
Since EQUATION ) researchers do not always know the precise form of the dependence, a less parametric approach may be preferred.
To illustrate the performance of standard errors clustered by firm, year, or both, I simulated data sets with a fixed firm and time effect.
Clustering by two dimensions produces less biased standard errors.
In my simulations, the number of t-statistics which are greater than 2.58 rises to 5% when the number of firms or time periods falls to 10 (see Thompson, 2005 for more complete results).

V) Estimating Standard Errors in the Presence of a Temporary Firm Effect

The analysis thus far has assumed that the firm effect is fixed.
The dependence between residuals may decay as the time between them increases (e.g. ρ(ε t , ε t-k ) may decline with k).
Assuming homoscedasticity makes the interpretation of the results simpler.
In addition, if the performance of the different standard error estimates depends on the permanence of the firm effect, researchers need to know this.

VI)

I used simulated data in the previous sections.
In real world applications, the authors may have priors about the data's structure (are firm effects or time effects more important, are they permanent or temporary), but they do not know the data structure for certain.
This way I can demonstrate how the different methods for estimating standard errors compare, confirm that the methods used by some published papers have produced significantly biased results, and show what the authors can learn from the different standard errors estimates.
The constant is calculated as the average of the yearly intercepts.
Thus the Fama-MacBeth R 2 does not include the explanatory power of time dummies.

VII) Conclusions.

It is well known from first-year econometrics classes that OLS and White standard errors are biased when the residuals are not independent.
The standard errors clustered by firm are unbiased and produce correctly sized confidence intervals whether the firm effect is permanent or temporary.
Alternatively, researchers can cluster by multiple dimensions, assuming there are a sufficient number of clusters in each dimension.
The fraction of the independent variable's variance which is due to a firm specific component varies across the columns of the table from 0 percent (no firm effect) to 75 percent.
The second entry is the standard deviation of the coefficient estimated by Fama-MacBeth.

Did you find this useful? Give us your feedback

Figures (16)

Table 6: Asset Pricing Application Equity Returns and Asset Tangibility

Figure 2: Distribution of Simulated T-Statistics

Figure 1: Residual Cross Product Matrix Assumptions About Zero Covariances

Figure 6: Residual Cross Product Matrix Firm and Time Effects

Table 4: Estimating Standard Errors with a Time Effect Fama-MacBeth Standard Errors

Figure 5: True Standard Errors and Clustered Standard Errors as a function of cluster size (T)

Table 5: Estimated Standard Errors with a Non-Fixed Firm Effect Panel A: OLS and Clustered Standard Errors

Figure 7: Clustered T-Statistics in the Presence of a Firm and a Time Effect Clustered by Firm, by Time, or Both

Table 1: Estimating Standard Errors with a Firm Effect OLS and Clustered Standard Errors

Figure 4: Relative Performance of OLS, Clustered, and Newey-West Standard Errors

Table 2: Estimating Standard Errors with a Firm Effect Fama-MacBeth Standard Errors

Figure 8: Residuals and Independent Variables Auto-Correlation: Asset Pricing Example Panel A: Within Firm

Table 7: Corporate Finance Application Capital Structure Regressions (1965-2003)

Table 3: Estimating Standard Errors with a Time Effect OLS and Clustered Standard Errors

Figure 9: Residuals and Independent Variables Auto-Correlation: Corporate Finance Example Panel A: Within Firm

Figure 3: Bias in Estimated Standard Errors as a function of years per cluster

Content maybe subject to copyright Report

NBER WORKING PAPER SERIES

ESTIMATING STANDARD ERRORS

IN FINANCE PANEL DATA SETS:

COMPARING APPROACHES

Mitchell A. Petersen

Working Paper 11280

http://www.nber.org/papers/w11280

NATIONAL BUREAU OF ECONOMIC RESEARCH

1050 Massachusetts Avenue

Cambridge, MA 02138

April 2005

I thank the Financial Institutions and Markets Research Center at Northwestern University’s Kellogg School

for support. In writing this paper, I have benefitted greatly from discussions with Kent Daniel, Mariassunta

Giannetti, Toby Moskowitz, Joshua Rauh, Michael Roberts, Paola Sapienza, Doug Staiger, and Annette

Vissing-Jorgensen as well as the comments of seminar participants at the Federal Reserve Bank of Chicago,

Northwestern University, and the Universities of Chicago, Columbia, and Iowa. The research assistance of

Sungjoon Park, Nick Halpern, and Casey Liang is greatly appreciated. The views expressed herein are those

of the author(s) and do not necessarily reflect the views of the National Bureau of Economic Research.

may be quoted without explicit permission provided that full credit, including © notice, is given to the

source.

Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches

Mitchell A. Petersen

NBER Working Paper No. 11280

April 2005, Revised June 2006

JEL No. G1, G3, C1

ABSTRACT

In both corporate finance and asset pricing empirical work, researchers are often confronted with

panel data. In these data sets, the residuals may be correlated across firms and across time, and OLS

standard errors can be biased. Historically, the two literatures have used different solutions to this

problem. Corporate finance has relied on Rogers standard errors, while asset pricing has used the

Fama-MacBeth procedure to estimate standard errors. This paper will examine the different methods

used in the literature and explain when the different methods yield the same (and correct) standard

errors and when they diverge. The intent is to provide intuition as to why the different approaches

sometimes give different answers and thus give researchers guidance for their use.

Mitchell A. Petersen

Kellogg Graduate School of Management

Northwestern University

2001 Sheridan Road

Evanston, IL 60208

and NBER

petersen@northwestern.edu

I searched papers published in the Journal of Finance, the Journal of Financial Economics, and the Review

of Financial Studies in the years 2001- 2004 for a description of how the coefficients and standard errors were estimated

in a panel data set. I included both linear regressions as well as non-linear techniques such as logits and tobits in my

survey. Panel data sets are data sets where observations can be grouped into clusters (e.g. multiple observations per firm,

per industry, per year, or per country). I included only papers which report at least five observations in each dimension

(e.g. firms and years). 207 papers met the selection criteria. Papers which did not report the method for estimating the

standard errors, or reported correcting the standard errors only for heteroscedasticity (i.e. White standard errors which

are not robust to within cluster dependence) are coded as not having corrected the standard errors for within cluster

dependence. Where the paper’s description was ambiguous, I contacted the authors.

Although White or OLS standard errors may be correct, many of the published papers report regressions where

I would expect the residuals to be correlated across observations on the same firm in different years (e.g. bid-ask spread

regressed on exchange dummies, stock price, volatility, and average daily volume or leverage regressed on the market

to book ratio and firm size) or correlated across observations on different firms in the same year (e.g. equity returns

regresses on earnings surprises). In these cases, the bias in the standard errors can be quite large. See Section VI for two

illustrations.

I) Introduction

It is well known that OLS standard errors are unbiased when the residuals are independent

and identically distributed. When the residuals are correlated across observations, OLS standard

errors can be biased and either over or underestimate the true variability of the coefficient estimates.

Although the use of panel data sets (e.g. data sets that contain observations on multiple firms in

multiple years) is common in finance, the ways that researchers have addressed possible biases in

the standard errors varies widely and in many cases is incorrect. In recently published finance papers

which include a regression on panel data, forty-two percent of the papers did not adjust the standard

errors for possible dependence in the residuals.

Approaches for estimating the coefficients and

standard errors in the presence of within cluster correlation varied among the remaining papers.

Thirty-four percent of the papers estimated both the coefficients and the standard errors using the

Fama-MacBeth procedure (Fama-MacBeth, 1973). Twenty-nine percent of the papers included

dummy variables for each cluster (e.g. fixed effects or within estimation). The next two most

common methods used OLS (or an analogous method) to estimate the coefficients but reported

standard errors adjusted for correlation within a cluster. Seven percent of the papers adjusted the

standard errors using the Newey-West procedure (Newey and West, 1987) modified for use in a

panel data set, while 23 percent of the papers reported clustered standard errors (Williams, 2000,

Rogers, 1993, Andrews, 1991, Moulton, 1990, Arellano, 1987, Moulton, 1986, Liang, and Zeger,

1986) which are White standard errors adjusted to account for possible correlation within a cluster.

These are also called Rogers standard errors in the finance literature.

Although the literature has used a diversity of methods to estimate standard errors in panel

data sets, the chosen method is often incorrect and the literature provides little guidance to

researchers as to which method should be used. In addition, some of the advice in the literature is

simply wrong. Since the methods sometimes produce incorrect estimates, it is important to

understand how the methods compare and how to select the correct one. This is the paper’s

objective.

There are two general forms of dependence which are most common in finance applications.

They will serve as the basis for the analysis. The residuals of a given firm may be correlated across

years (time series dependence) for a given firm. I will call this an unobserved firm effect (see

Wooldridge, 2002). Alternatively, the residuals of a given year may be correlated across different

firms (cross-sectional dependence). I will call this a time effect. I will simulate panel data with both

forms of dependence, first individually and then jointly. With the simulated data, I can estimate the

coefficients and standard errors using each of the methods and compare their relative performance.

Section II contains the standard error estimates in the presence of an unobserved firm effect.

My results show that both OLS and the Fama-MacBeth standard errors are biased downward. The

Newey-West standard errors, as modified for panel data, are also biased but the bias is small. Of the

most common approaches used in the literature and examined in this paper, only clustered standard

errors are unbiased as they account for the residual dependence created by the firm effect. In Section

III, the same analysis is conducted with an unobserved time effect instead of a firm effect. Since the

Fama-MacBeth procedure is designed to address a time effect, the Fama-MacBeth standard errors

are unbiased. The intuition of these first two sections carries over to Section IV, were I simulate data

with both a firm and a time effect.

I initially specified the firm effect as a constant (e.g. it does not decay over time). In practice,

the firm effect may decay and so the correlation between residuals declines as the time between them

grows. In Section V, I simulate data with a more general correlation structure. This allows me to

compare OLS, clustered, and Fama-MacBeth standard errors in a more general setting. Simulating

the temporary firm effect also allows me to examine the relative accuracy of two additional methods

for adjusting standard errors: fixed effects (firm dummies) and adjusted Fama-MacBeth standard

errors whose use is becoming more common. I show that including firm dummies eliminates the bias

in OLS standard errors only when the firm effect is fixed. I also show that even after adjusting

Fama-MacBeth standard errors, as suggested by some authors (Cochrane, 2001), they are still

biased.

Most papers do not report standard errors estimated by multiple methods. Thus in Section

VI, I apply the various estimation techniques to two real data sets and compare their relative

performance. This serves two purposes. First, it demonstrates that the methods used in some

published papers produce biases in the standard errors and t-statistics which are very large. This is

why using the correct method to estimate standard errors is important. Examining actual data also

allows me to show how differences in standard error estimates can provide information about the

deficiency in a model and directions for improving them.

HTML Viewer

Frequently Asked Questions (8)

Q1. What contributions have the authors mentioned in the paper "Nber working paper series estimating standard errors in finance panel data sets: comparing approaches" ?

This paper will examine the different methods used in the literature and explain when the different methods yield the same ( and correct ) standard errors and when they diverge. The intent is to provide intuition as to why the different approaches sometimes give different answers and thus give researchers guidance for their use. Papers which did not report the method for estimating the standard errors, or reported correcting the standard errors only for heteroscedasticity ( i. e. White standard errors which are not robust to within cluster dependence ) are coded as not having corrected the standard errors for within cluster dependence. Where the paper ’ s description was ambiguous, I contacted the authors. Although White or OLS standard errors may be correct, many of the published papers report regressions where I would expect the residuals to be correlated across observations on the same firm in different years ( e. g. bid-ask spread regressed on exchange dummies, stock price, volatility, and average daily volume or leverage regressed on the market to book ratio and firm size ) or correlated across observations on different firms in the same year ( e. g. equity returns regresses on earnings surprises ). In recently published finance papers which include a regression on panel data, forty-two percent of the papers did not adjust the standard errors for possible dependence in the residuals. Thirty-four percent of the papers estimated both the coefficients and the standard errors using the Fama-MacBeth procedure ( Fama-MacBeth, 1973 ). Twenty-nine percent of the papers included dummy variables for each cluster ( e. g. fixed effects or within estimation ). The next two most common methods used OLS ( or an analogous method ) to estimate the coefficients but reported standard errors adjusted for correlation within a cluster. Seven percent of the papers adjusted the

Q2. What is the independence assumption used to move from the first to the second line in equation 3?

The independence assumption is used to move from the first to the second line in equation (3) (i.e., the covariance between residuals is zero).

Q3. How can the authors estimate the coefficients of a random effects model?

By estimating a generalized least squares version of the random effects model (i.e. a panel data set with an unobserved firm effect), more efficient coefficient estimates can be obtained (see Wooldridge, 2002).

Q4. Why does the firm effect not appear in the estimated variance?

Since the firm effect influences both the yearly coefficient estimate and the sample average of the yearly coefficient estimates, it does not appear in the estimated variance.

Q5. How can the authors determine the nature of the dependence in the residuals?

By examining how standard errors change when the authors cluster by firm or time (i.e. compare columns The authorto II and The authorto III), the authors can determine the nature of the dependence which remains in the residuals and this can guide us on how to improve their models.

Q6. How many percent of the papers did not adjust the standard errors for possible dependence in the residuals?

In recently published finance papers which include a regression on panel data, forty-two percent of the papers did not adjust the standard errors for possible dependence in the residuals.

Q7. How many percent of the papers estimated the coefficients and standard errors?

Thirty-four percent of the papers estimated both the coefficients and the standard errors using the Fama-MacBeth procedure (Fama-MacBeth, 1973).

Q8. How many percent of the variability in the independent variable is due to the time effect?

I allowed the fraction of variability in both the residual and the independent variable which is due to the time effect to range from zero to seventy-five percent in twenty-five percent increments.

Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches

Summary (4 min read)

Introduction

A)

Var [ βOLS

B)

C)

MacBeth coefficient estimates.

D)

E)

F)

III)

IV)

V) Estimating Standard Errors in the Presence of a Temporary Firm Effect

VI)

VII) Conclusions.

Figures (16)

Citations

References

Related Papers (5)

Frequently Asked Questions (8)

Q1. What contributions have the authors mentioned in the paper "Nber working paper series estimating standard errors in finance panel data sets: comparing approaches" ?

Q2. What is the independence assumption used to move from the first to the second line in equation 3?

Q3. How can the authors estimate the coefficients of a random effects model?

Q4. Why does the firm effect not appear in the estimated variance?

Q5. How can the authors determine the nature of the dependence in the residuals?

Q6. How many percent of the papers did not adjust the standard errors for possible dependence in the residuals?

Q7. How many percent of the papers estimated the coefficients and standard errors?

Q8. How many percent of the variability in the independent variable is due to the time effect?