Out-of-Sample Forecast Tests Robust to the Choice of Window Size
Summary (3 min read)
1 Introduction
- This paper proposes new methodologies for evaluating the out-of-sample forecasting performance of economic models.
- The novelty of the methodologies that the authors propose is that they are robust to the choice of the estimation and evaluation window size.
- The choice of the estimation window size has always been a concern for practitioners, since the use of di¤erent window sizes may lead to di¤erent empirical results in practice.
- The procedures that the authors propose ensure that this is the case by evaluating the models forecasting performance for a variety of estimation window sizes, and then taking summary statistics of this sequence.
- The paper instead proposes to take summary statistics of tests of predictive ability computed over several estimation window sizes.
2 Robust Tests of Predictive Accuracy When the Win-
- The authors assume that the researcher is interested in evaluating the performance of h steps-ahead direct forecasts for the scalar variable yt+h using a vector of predictors xt using either a rolling, recursive or xed window direct forecast scheme.
- The methods proposed in this paper can be applied to out-of-sample tests of equal predictive ability, forecast rationality and unbiasedness.
- First, if the researcher tries several window sizes and then reports the empirical evidence based on the window size that provides him the best empirical evidence in favor of predictive ability, his test may be oversized.
- The following proposition states the general intuition behind the approach proposed in this paper.
- For each of the cases that the authors consider.
2.1 Non-Nested Model Comparisons
- Traditionally, researchers interested in doing inference about the relative forecasting performance of competing, non-nested models rely on the Diebold and Mariano s (1995), West s (1996) and McCracken s (2000) test statistics.
- The test statistic that they propose relies on the sample average of the sequence of standardized out-of-sample loss di¤erences, eq. (1): LT (R) 1b RP 1=2 TX t=R Lt+h(b t;R; b t;R); (5) where b 2R is a consistent estimate of the long run variance matrix of the out-of-sample loss di¤erences.
- Consistent estimates of 2 that take into account parameter estimation uncertainty in recursive windows are provided by West (1996) and in rolling and xed windows are provided by McCracken (2000, p. 203, eqs. 5 and 6).
- In particular, a leading case where (6) can be used is when the same loss function is used for estimation and evaluation.
- The asymptotic normality result does not hinge on whether or not two models are nested but rather on whether or not the disturbance terms of the two models are numerically identical in population under the null hypothesis.
2.2 Nested Models Comparison
- For the case of nested models comparison, the authors follow Clark and McCracken (2001).
- Let Model 1 be the parsimonious model, and Model 2 be the larger model that nests Model 1.
- Let yt+h denote the variable to be forecast and let the period-t forecasts of yt+h from the two models be denoted by by1;t+h and by2;t+h: the rst ("small") model uses k1 regressors x1;t and the second ("large") model uses k1+ k2 = k regressors x1;t and x2;t.
- In particular, their assumptions hold for one-step-ahead forecast errors (h = 1) from linear, homoskedastic models, OLS estimation, and MSE loss function (as discussed in Clark and McCracken (2001), the loss function used for estimation has to be the same as the loss function used for evaluation).
2.3 Regression-Based Tests of Predictive Ability
- Under the widely used MSFE loss, optimal forecasts have a variety of properties.
- The following are special cases of regression-based tests of predictive ability: (i) Forecast Unbiasedness Tests: bLt+h = bvt+h: (ii) Mincer-Zarnowitz s (1969) Tests (or E¢ ciency Tests): bLt+h = bvt+hXt, where Xt is a vector of predictors known at time t (see also Chao, Corradi and Swanson, 2001).
- Which are similar to those discussed for eq. (5).
- West and McCracken (1998) have shown that it is very important to allow for a general variance estimator that takes into account estimation uncertainty and/or correcting the statistics by the necessary adjustments.
- The procedures that the authors propose can also be applied to Patton and Timmermann s (2007) generalized forecast error.
3 Robust Tests of Predictive Accuracy When the Win-
- All the tests considered so far rely on the assumption that the window is a xed fraction of the total sample size, asymptotically.
- When the window size diverges to in nity, the correlation between the rolling regression estimator and the regressor vanishes even when the regressor is not strictly exogenous.
- When x1t is null, the second term on the right-hand side of equation (20) is zero even when x2t is not strictly exogenous, and their adjustment term and theirs become identical.
5 Empirical evidence
- The poor forecasting ability of economic models of exchange rate determination has been recognized since the works by Meese and Rogo¤ (1983a,b), who established that a random walk forecasts exchange rates better than any economic models in the short run.
- Let t denote the in ation rate in the home country, t denote the in ation rate in the foreign country, denote the target level of in ation in each country, ygapt denote the output gap in the home country and y gap t denote the output gap in the foreign country.
- The benchmark model, against which the forecasts of both models (27) and (28) are evaluated, is the random walk, according to which the exchange rate changes are forecast to be zero.
- Data on interest rates were incomplete for Portugal and the Netherlands, so the authors do not report UIRP results for these countries.
- This suggests that the empirical evidence in favor of predictive ability may be driven by the existence of instabilities in the predictive ability, for which rolling windows of small size are advantageous.
6 Conclusions
- This paper proposes new methodologies for evaluating economic models forecasting performance that are robust to the choice of the estimation window size.
- These methodologies are noteworthy since they allow researchers to reach empirical conclusions that do not depend on a speci c estimation window size.
- The authors show that tests traditionally used by forecasters su¤er from size distortions if researchers report, in reality, the best empirical result over various window sizes, but without taking into account the search procedure when doing inference in practice.
- Traditional tests may also lack power to detect predictive ability when implemented for an "ad-hoc" choice of the window size.
- Finally, their empirical results demonstrate that the recent empirical evidence in favor of exchange rate predictability is even stronger when allowing a wider search over window sizes.
Did you find this useful? Give us your feedback
Citations
5,689 citations
684 citations
Cites background or methods from "Out-of-Sample Forecast Tests Robust..."
...Following Kandel and Stambaugh (1996), Campbell and Thompson (2008) and Ferreira and Santa-Clara (2011), among others, we compute the certainty equivalent return (CER) gain and Sharpe Ratio for a mean-variance investor who optimally allocates across equities and the risk-free asset using the out-of-sample predictive regression forecasts....
[...]
...At least as early as Keynes (1936), researchers have analyzed whether investor sentiment can affect asset prices due to the well-known psychological fact that people with high (low) sentiment tend to make overly optimistic (pessimistic) judgments and choices....
[...]
422 citations
376 citations
306 citations
Cites methods from "Out-of-Sample Forecast Tests Robust..."
...Then pseudo-out-of-sample methods actually expand the scope for data mining in finite samples, as emphasized by Rossi and Inoue (2012) and Hansen and Timmermann (2011), because one can then also mine over t∗....
[...]
References
18,117 citations
Additional excerpts
...−1 T∑ t=R Ldt+h × (θ̂t,R, γ̂t,R) Ldt+h−i(θ̂t−i,R, γ̂t−i,R), (6) where Ldt+h(θ̂t,R, γ̂t,R) ≡ Lt+h(θ̂t,R, γ̂t,R) − P −1 ∑T t=R Lt+h(θ̂t,R, γ̂t,R) and q(P ) is a bandwidth that grows with P (e.g., Newey and West 1987)....
[...]
13,153 citations
7,222 citations
5,689 citations
"Out-of-Sample Forecast Tests Robust..." refers background or methods in this paper
...Under the null hypothesis that 2 = [ 0 1 0 0]0, the MSPE-adjusted of Clark and West (2007) can be written as:...
[...]
...Suppose that 2 = [ 0 1 0 0]0 and that x2;t is strictly exogenous....
[...]
5,628 citations
Related Papers (5)
Frequently Asked Questions (11)
Q2. What are the methods proposed in this paper?
The methods proposed in this paper can be applied to out-of-sample tests of equal predictive ability, forecast rationality and unbiasedness.
Q3. What is the way to avoid snooping over the choices of and?
To avoid data snooping over the choices of and , the authors recommend researchers to impose symmetry by xing = 1 , and to use = [0:15] in practice.
Q4. What is the novelty of the methodologies that the authors propose?
The novelty of the methodologies that the authors propose is that they are robust to the choice of the estimation and evaluation window size.
Q5. What is the framework for evaluating forecast errors?
The framework allows for linear and non-linear models estimated by any extremum estimator (e.g. OLS, GMM and MLE), the data to have serial correlation and heteroskedasticity as long as stationary is satis ed (which rules out unit roots and structural breaks), and forecast errors (which can be either one period or multi-period) evaluated using continuously di¤erentiable loss functions, such as MSE.
Q6. What is the simplest way to determine if t+h(R) has zero?
Assumption (a) is necessary for t+h(R) to have zero mean and is satis ed under the assumption discussed by Clark and West (x1t is not null) or under the assumption that x2t is strictly exogenous.
Q7. What is the asymptotic normality result of a model?
The asymptotic normality result does not hinge on whether or not two models are nested but rather on whether or not the disturbance terms of the two models are numerically identical in population under the null hypothesis.
Q8. What is the significance of the variance estimator?
West and McCracken (1998) have shown that it is very important to allow for a general variance estimator that takes into account estimation uncertainty and/or correcting the statistics by the necessary adjustments.
Q9. What does the evidence in favor of predictive ability suggest?
This suggests that the empirical evidence in favor of predictive ability may be driven by the existence of instabilities in the predictive ability, for which rolling windows of small size are advantageous.
Q10. What is the way to test the regressors?
Before the authors get into details, a word of caution: their setup requires strict exogeneity of the regressors, which is a very strong assumption in time series application.
Q11. What is the evidence for the ad-hoc window size?
The evidence highlights the sharp sensitivity of power of all the tests to the timing of the break relative to the forecast evaluation window, and shows that, in the presence of instabilities, their proposed tests tend to be more powerful than some of the tests based on an ad-hoc window size, whose power properties crucially depend on the window size.