Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches
Summary (4 min read)
Introduction
- It is well known that OLS standard errors are unbiased when the residuals are independent and identically distributed.
- Thirty-four percent of the papers estimated both the coefficients and the standard errors using the Fama-MacBeth procedure (Fama-MacBeth, 1973) .
- There are two general forms of dependence which are most common in finance applications.
- The residuals of a given firm may be correlated across years (time series dependence) for a given firm.
- Of the most common approaches used in the literature and examined in this paper, only clustered standard errors are unbiased as they account for the residual dependence created by the firm effect.
A)
- To provide intuition on why the standard errors produced by OLS are incorrect and how alternative estimation methods correct this problem, it is helpful to very briefly review the expression for the variance of the estimated coefficients.
- This is the standard OLS formula and is based on the assumption that the errors are independent and identically distributed (Greene, 1990) .
- Each observation of the dependent variable is a monthly equity return.
- Since the adjustment in the standard error, and the bias in White standard errors, is a function of the monthly auto-correlation in the Xs (a large number) times the auto-correlation in the residuals (zero), the standard errors clustered by firm are equal to the White standard errors.
- If the time effect influenced each firm in a given month by the same amount, the time dummies would absorb the effect and clustering by time would not change the reported standard errors.
Var [ βOLS
- I use the assumption that residuals are independent across firms in deriving the second line.
- To understand this intuition, consider the extreme case where the independent variables and residuals are perfectly correlated across time (i.e. ρ X =1 and ρ ε =1).
- The basic program which I used to simulate the data and estimate the coefficients and standard errors is posted on my web site.
- Estimated standard error will shrink accordingly and incorrectly.
- The correlation of the residuals within cluster is the problem the clustered standard errors are designed to correct.
B)
- Testing the Standard Error Estimates by Simulation.
- The estimated standard errors are extremely close to the true standard errors and the number of statistically significant t-statistics is close to three percent across the simulations (using a 1 percent critical value).
- Once the firm effect is temporary, the OLS standard errors again underestimate the true standard errors even when firm dummies are included in the regression (Wooldridge, 2003, Baker, Stein, and Wurgler, 2003) .
- In the asset pricing example, these standard errors were identical to the standard errors clustered by time, since there was no firm effect (Table 6 ).
- The results are similar for firm size, firm age, asset tangibility (the ratio of property, plant, and equipment to assets), and R&D expenditure.
C)
- An alternative way to estimate the regression coefficients and standard errors when the residuals are not independent is the Fama-MacBeth approach (Fama and MacBeth, 1973) .
- And the estimated variance of the Fama-MacBeth estimate is calculated as: This is rarely done in the finance literature.
- The GLS estimates are more efficient than the OLS estimates (with or without firm dummies) when the residuals are correlated (compare Table 5 -Panels A and B).
- If the firm effect is temporary, then the residuals are still correlated within cluster and this is the source of the bias in the standard errors.
MacBeth coefficient estimates.
- This result is the same as their expression for the variance of the OLS coefficient (see equation 7).
- The Fama-MacBeth standard error are biased in exactly the same way as the OLS estimates.
- In both cases, the magnitude of the bias is a function of the serial correlation of both the independent variable and the residual within a cluster and the number of time periods per firm.
D)
- Since the average first-order auto-correlation is negative, the adjusted Fama-MacBeth standard errors are even more biased than the unadjusted standard errors.
- To verify that this is correct, I re-ran the simulation using 20 years of data per firm and the average estimated serial correlation moved closer to zero, rising from -0.1157 to -0.0556.
E)
- Incorrect Standard Error Estimates in Published Papers.
- As part of my literature survey, I looked for papers which ran a regression of one persistent firm characteristic on other persistent firm characteristics (i.e. the serial correlation of the variables is large and dies away slowly as the lag 11 Both of these papers correct the Fama-MacBeth standard errors for the first order auto-correlation of the estimated slopes.
- Pastor and Veronesi (2003) report that this does not change their answer.
- I will show in Section V-C that this correction still produces biased standard errors and this probably explains Pastor and Veronesi's finding that the adjustment has little effect on their estimated standard errors.
- 12 Baker and Wurgler (2002) estimate both White and Fama-MacBeth standard errors but do not report the Fama-MacBeth standard errors since they are the same as the White standard errors.
F)
- An alternative approach for addressing the correlation of errors across observation is the Newey-West procedure (Newey and West, 1987) .
- Thus having a lag length of less than the maximum (T-1), will cause the Newey-West standard errors to underestimate the true standard error when the firm effect is fixed.
- When I drew observations as a cluster (e.g. I drew 500 firms with replacement and took all 10 years for any firm which was drawn), the estimated standard errors are the same as the clustered standard errors (e.g. 0.0505 for bootstrap versus 0.0508 for clustered).
- Newey and West show that if M is allowed to grow at the correct rate with the sample size (T), then their estimate is consistent.
III)
- To demonstrate how the techniques work in the presence of a time effect, I generated data sets which contain only a time effect (observations on different firms within the same year are correlated).
- The expressions for the standard errors in the presence of only a time effect are correct once I exchange N and T. EQUATION A) Clustered Standard Error Estimates.
- The problem arises due to the limited number of clusters (e.g. years).
- To explore this issue, I simulated data sets of 5,000 observations with the number of years (or clusters) ranging from 5 to 100.
- The bias in the clustered standard error estimates declines with the number of clusters, dropping from 27 percent when there are 5 years (or clusters) to 3 percent when there are 40 years to 1 percent when there are 100 years .
IV)
- Estimating Standard Errors in the Presence of a Fixed Firm and Time Effect.
- Since EQUATION ) researchers do not always know the precise form of the dependence, a less parametric approach may be preferred.
- To illustrate the performance of standard errors clustered by firm, year, or both, I simulated data sets with a fixed firm and time effect.
- Clustering by two dimensions produces less biased standard errors.
- In my simulations, the number of t-statistics which are greater than 2.58 rises to 5% when the number of firms or time periods falls to 10 (see Thompson, 2005 for more complete results).
V) Estimating Standard Errors in the Presence of a Temporary Firm Effect
- The analysis thus far has assumed that the firm effect is fixed.
- The dependence between residuals may decay as the time between them increases (e.g. ρ(ε t , ε t-k ) may decline with k).
- Assuming homoscedasticity makes the interpretation of the results simpler.
- In addition, if the performance of the different standard error estimates depends on the permanence of the firm effect, researchers need to know this.
VI)
- I used simulated data in the previous sections.
- In real world applications, the authors may have priors about the data's structure (are firm effects or time effects more important, are they permanent or temporary), but they do not know the data structure for certain.
- This way I can demonstrate how the different methods for estimating standard errors compare, confirm that the methods used by some published papers have produced significantly biased results, and show what the authors can learn from the different standard errors estimates.
- The constant is calculated as the average of the yearly intercepts.
- Thus the Fama-MacBeth R 2 does not include the explanatory power of time dummies.
VII) Conclusions.
- It is well known from first-year econometrics classes that OLS and White standard errors are biased when the residuals are not independent.
- The standard errors clustered by firm are unbiased and produce correctly sized confidence intervals whether the firm effect is permanent or temporary.
- Alternatively, researchers can cluster by multiple dimensions, assuming there are a sufficient number of clusters in each dimension.
- The fraction of the independent variable's variance which is due to a firm specific component varies across the columns of the table from 0 percent (no firm effect) to 75 percent.
- The second entry is the standard deviation of the coefficient estimated by Fama-MacBeth.
Did you find this useful? Give us your feedback
Citations
3,236 citations
2,542 citations
2,380 citations
1,995 citations
1,554 citations
References
28,298 citations
25,689 citations
18,117 citations
17,111 citations
16,198 citations
Related Papers (5)
Frequently Asked Questions (8)
Q2. What is the independence assumption used to move from the first to the second line in equation 3?
The independence assumption is used to move from the first to the second line in equation (3) (i.e., the covariance between residuals is zero).
Q3. How can the authors estimate the coefficients of a random effects model?
By estimating a generalized least squares version of the random effects model (i.e. a panel data set with an unobserved firm effect), more efficient coefficient estimates can be obtained (see Wooldridge, 2002).
Q4. Why does the firm effect not appear in the estimated variance?
Since the firm effect influences both the yearly coefficient estimate and the sample average of the yearly coefficient estimates, it does not appear in the estimated variance.
Q5. How can the authors determine the nature of the dependence in the residuals?
By examining how standard errors change when the authors cluster by firm or time (i.e. compare columns The authorto II and The authorto III), the authors can determine the nature of the dependence which remains in the residuals and this can guide us on how to improve their models.
Q6. How many percent of the papers did not adjust the standard errors for possible dependence in the residuals?
In recently published finance papers which include a regression on panel data, forty-two percent of the papers did not adjust the standard errors for possible dependence in the residuals.
Q7. How many percent of the papers estimated the coefficients and standard errors?
Thirty-four percent of the papers estimated both the coefficients and the standard errors using the Fama-MacBeth procedure (Fama-MacBeth, 1973).
Q8. How many percent of the variability in the independent variable is due to the time effect?
I allowed the fraction of variability in both the residual and the independent variable which is due to the time effect to range from zero to seventy-five percent in twenty-five percent increments.