scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Out-of-Sample Forecast Tests Robust to the Choice of Window Size

01 Aug 2011-Journal of Business & Economic Statistics (Taylor & Francis Group)-Vol. 30, Iss: 3, pp 432-453
TL;DR: In this paper, the authors proposed new methodologies for evaluating economic models' out-of-sample forecasting performance that are robust to the choice of the estimation window size, and evaluated the predictive ability of forecasting models over a wide range of window sizes.
Abstract: This article proposes new methodologies for evaluating economic models’ out-of-sample forecasting performance that are robust to the choice of the estimation window size. The methodologies involve evaluating the predictive ability of forecasting models over a wide range of window sizes. The study shows that the tests proposed in the literature may lack the power to detect predictive ability and might be subject to data snooping across different window sizes if used repeatedly. An empirical application shows the usefulness of the methodologies for evaluating exchange rate models’ forecasting ability.

Summary (3 min read)

1 Introduction

  • This paper proposes new methodologies for evaluating the out-of-sample forecasting performance of economic models.
  • The novelty of the methodologies that the authors propose is that they are robust to the choice of the estimation and evaluation window size.
  • The choice of the estimation window size has always been a concern for practitioners, since the use of di¤erent window sizes may lead to di¤erent empirical results in practice.
  • The procedures that the authors propose ensure that this is the case by evaluating the models forecasting performance for a variety of estimation window sizes, and then taking summary statistics of this sequence.
  • The paper instead proposes to take summary statistics of tests of predictive ability computed over several estimation window sizes.

2 Robust Tests of Predictive Accuracy When the Win-

  • The authors assume that the researcher is interested in evaluating the performance of h steps-ahead direct forecasts for the scalar variable yt+h using a vector of predictors xt using either a rolling, recursive or xed window direct forecast scheme.
  • The methods proposed in this paper can be applied to out-of-sample tests of equal predictive ability, forecast rationality and unbiasedness.
  • First, if the researcher tries several window sizes and then reports the empirical evidence based on the window size that provides him the best empirical evidence in favor of predictive ability, his test may be oversized.
  • The following proposition states the general intuition behind the approach proposed in this paper.
  • For each of the cases that the authors consider.

2.1 Non-Nested Model Comparisons

  • Traditionally, researchers interested in doing inference about the relative forecasting performance of competing, non-nested models rely on the Diebold and Mariano s (1995), West s (1996) and McCracken s (2000) test statistics.
  • The test statistic that they propose relies on the sample average of the sequence of standardized out-of-sample loss di¤erences, eq. (1): LT (R) 1b RP 1=2 TX t=R Lt+h(b t;R; b t;R); (5) where b 2R is a consistent estimate of the long run variance matrix of the out-of-sample loss di¤erences.
  • Consistent estimates of 2 that take into account parameter estimation uncertainty in recursive windows are provided by West (1996) and in rolling and xed windows are provided by McCracken (2000, p. 203, eqs. 5 and 6).
  • In particular, a leading case where (6) can be used is when the same loss function is used for estimation and evaluation.
  • The asymptotic normality result does not hinge on whether or not two models are nested but rather on whether or not the disturbance terms of the two models are numerically identical in population under the null hypothesis.

2.2 Nested Models Comparison

  • For the case of nested models comparison, the authors follow Clark and McCracken (2001).
  • Let Model 1 be the parsimonious model, and Model 2 be the larger model that nests Model 1.
  • Let yt+h denote the variable to be forecast and let the period-t forecasts of yt+h from the two models be denoted by by1;t+h and by2;t+h: the rst ("small") model uses k1 regressors x1;t and the second ("large") model uses k1+ k2 = k regressors x1;t and x2;t.
  • In particular, their assumptions hold for one-step-ahead forecast errors (h = 1) from linear, homoskedastic models, OLS estimation, and MSE loss function (as discussed in Clark and McCracken (2001), the loss function used for estimation has to be the same as the loss function used for evaluation).

2.3 Regression-Based Tests of Predictive Ability

  • Under the widely used MSFE loss, optimal forecasts have a variety of properties.
  • The following are special cases of regression-based tests of predictive ability: (i) Forecast Unbiasedness Tests: bLt+h = bvt+h: (ii) Mincer-Zarnowitz s (1969) Tests (or E¢ ciency Tests): bLt+h = bvt+hXt, where Xt is a vector of predictors known at time t (see also Chao, Corradi and Swanson, 2001).
  • Which are similar to those discussed for eq. (5).
  • West and McCracken (1998) have shown that it is very important to allow for a general variance estimator that takes into account estimation uncertainty and/or correcting the statistics by the necessary adjustments.
  • The procedures that the authors propose can also be applied to Patton and Timmermann s (2007) generalized forecast error.

3 Robust Tests of Predictive Accuracy When the Win-

  • All the tests considered so far rely on the assumption that the window is a xed fraction of the total sample size, asymptotically.
  • When the window size diverges to in nity, the correlation between the rolling regression estimator and the regressor vanishes even when the regressor is not strictly exogenous.
  • When x1t is null, the second term on the right-hand side of equation (20) is zero even when x2t is not strictly exogenous, and their adjustment term and theirs become identical.

5 Empirical evidence

  • The poor forecasting ability of economic models of exchange rate determination has been recognized since the works by Meese and Rogo¤ (1983a,b), who established that a random walk forecasts exchange rates better than any economic models in the short run.
  • Let t denote the in ation rate in the home country, t denote the in ation rate in the foreign country, denote the target level of in ation in each country, ygapt denote the output gap in the home country and y gap t denote the output gap in the foreign country.
  • The benchmark model, against which the forecasts of both models (27) and (28) are evaluated, is the random walk, according to which the exchange rate changes are forecast to be zero.
  • Data on interest rates were incomplete for Portugal and the Netherlands, so the authors do not report UIRP results for these countries.
  • This suggests that the empirical evidence in favor of predictive ability may be driven by the existence of instabilities in the predictive ability, for which rolling windows of small size are advantageous.

6 Conclusions

  • This paper proposes new methodologies for evaluating economic models forecasting performance that are robust to the choice of the estimation window size.
  • These methodologies are noteworthy since they allow researchers to reach empirical conclusions that do not depend on a speci c estimation window size.
  • The authors show that tests traditionally used by forecasters su¤er from size distortions if researchers report, in reality, the best empirical result over various window sizes, but without taking into account the search procedure when doing inference in practice.
  • Traditional tests may also lack power to detect predictive ability when implemented for an "ad-hoc" choice of the window size.
  • Finally, their empirical results demonstrate that the recent empirical evidence in favor of exchange rate predictability is even stronger when allowing a wider search over window sizes.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Out-of-Sample Forecast Tests Robust to the Choice of
Window Size
Barbara Rossi and Atsushi Inoue
(ICREA,UPF,CREI,BGSE,Duke) (NC State)
April 1, 2012
Abstract
This paper proposes new methodologies for evaluating out-of-sample forecasting
performance that are robust to the choice of the estimation window size. The method-
ologies involve evaluating the predictive ability of forecasting models over a wide range
of window sizes. We show that the tests proposed in the literature may lack the power
to detect predictive ability and might be subject to data snooping across di¤erent
window sizes if used repeatedly. An empirical application shows the usefulness of the
methodologies for evaluating exchange rate models’forecasting ability.
Keywords: Predictive Ability Testing, Forecast Evaluation, Estimation Window.
Acknowledgments: We thank the editor, the asso ciate editor, two referees as well as
S. Burke, M.W. McCracken, J. Nason, A. Patton, K. Sill, D. Thornton and seminar par-
ticipants at the 2010 Econometrics Workshop at the St. Louis Fed, Bocconi University,
U. of Arizona, Pompeu Fabra U., Michigan State U., the 2010 Triangle Econometrics
Conference, the 2011 SNDE Conference, th e 2011 Conference in honor of Hal White,
the 2011 NBER Summer Institute and the 2011 Joint Statistical Meetings for useful
comments and suggestions. This research was supported by National Science Founda-
tion grants SES-1022125 and SES-1022159 and North Carolina Agricultural Research
Service Project NC02265.
J.E.L. Codes: C22, C52, C53
1

1 Introduction
This paper proposes new methodologies for evaluating the out-of-sample forecasting perfor-
mance of economic models. The novelty of the methodologies that we propose is that they
are robust to the choice of the estimation and evaluation window size. The choice of the
estimation window size has always been a concern for practitioners, since the use of di¤er-
ent window sizes may lead to di¤erent empirical results in practice. In addition, arbitrary
choices of window sizes have consequences about how the sample is split into in-sample and
out-of-sample portions. Notwithstanding the importance of the problem, no satisfactory
solution has been proposed so far, and in the forecasting literature it is common to only
report empirical results for one window size. For example, to illustrate the derences in
the window sizes, we draw on the literature on forecasting exchange rates (the empirical
application we will focus on): Meese and Rogo¤ (1983a) use a window of 93 observations
in monthly data, Chinn (1991) a window size equal to 45 in quarterly data, Qi and Wu
(2003) use a window of 216 observations in monthly data, Cheung et al. (2005) consider
windows of 42 and 59 observations in quarterly data, Clark and West’s (2007) window is 120
observations in monthly data, Gourinchas and Rey (2007) consider a window of 104 obser-
vations in quarterly data, and Molo dtsova and Papell (2009) consider a window size of 120
observations in monthly data. This common practice raises two concerns. A rst concern
is that the ad hoc”window size used by the researcher may not detect signicant predic-
tive ability even if there would be signi…cant predictive ability for some other window size
choices. A second concern is the possibility that satisfactory results were obtained simply by
chance, after data snooping over window sizes. That is, the successful evidence in favor of
predictive ability might have been found after trying many window sizes, although only the
results for the successful window size were reported and the search process was not taken
into account when evaluating their statistical signi…cance. Only rarely do researchers check
the robustness of the empirical results to the choice of the window size by reporting results
for a selected choice of window sizes. Ultimately, however, the size of the estimation window
is not a parameter of interest for the researcher: the objective is rather to test predictive
ability and, ideally, researchers would like to reach empirical conclusions that are robust to
the choice of the estimation window size.
This paper views the estimation window as a nuisance parameter”: we are not interested
in selecting the best window; rather we would like to propose predictive ability tests
that are robust to the choice of the estimation window size. The procedures that we
2

propose ensure that this is the case by evaluating the modelsforecasting performance for
a variety of estimation window sizes, and then taking summary statistics of this sequence.
Our methodology can be applied to most tests of predictive ability that have been proposed
in the literature, such as Diebold and Mariano (1995), West (1996), McCracken (2000) and
Clark and McCracken (2001). We also propose methodologies that can be applied to Mincer
and Zarnowitzs (1969) tests of forecast ciency, as well as more general tests of forecast
optimality. Our methodologies allow both for rolling as well as recursive window estimation
schemes and let the window size to be large relative to the total sample size. Finally, we also
discuss methodologies that can be used in the Giacomini and White’s (2005) and Clark and
Wests (2007) frameworks, where the estimation scheme is based on a rolling window with
xed size.
This paper is closely related to the works by Pesaran and Timmermann (2007) and Clark
and McCracken (2009), and more distantly related to Pesaran, Pettenuzzo and Timmermann
(2006) and Giacomini and Rossi (2010). Pesaran and Timmermann (2007) propose cross val-
idation and forecast combination methods that identify the "ideal" window size using sample
information. In other words, Pesaran and Timmermann (2007) extend forecast averaging
pro cedures to deal with the uncertainty over the size of the estimation window, for example,
by averaging forecasts computed from the same model but over various estimation win-
dow sizes. Their main objective is to improve the models forecast. Similarly, Clark and
McCracken (2009) combine rolling and recursive forecasts in the attempt to improve the
forecasting model. Our paper instead proposes to take summary statistics of tests of predic-
tive ability computed over several estimation window sizes. Our objective is not to improve
the forecasting model nor to estimate the ideal window size. Rather, our objective is to
assess the robustness of conclusions of predictive ability tests to the choice of the estimation
window size. Pesaran, Pettenuzzo and Timmermann (2006) have exploited the existence of
multiple breaks to improve forecasting ability; in order to do so, they need to estimate the
pro cess driving the instability in the data. An attractive feature of the procedure we propose
is that it does not need to impose nor determine when the structural breaks have happened.
Giacomini and Rossi (2010) propose techniques to evaluate the relative performance of com-
peting forecasting models in unstable environments, assuming a given”estimation window
size. In this paper, our goal is instead to ensure that forecasting ability tests be robust to the
choice of the estimation window size. That is, the procedures that we propose in this paper
are designed for determining whether ndings of predictive ability are robust to the choice
of the window size, not to determine which point in time the predictive ability shows up:
3

the latter is a very di¤erent issue, important as well, and was discussed in Giacomini and
Rossi (2010). Finally, this paper is linked to the literature on data snooping: if researchers
report empirical results for just one window size (or a couple of them) when they actually
considered many possible window sizes prior to reporting their results, their inference will
be incorrect. This paper provides a way to account for data snooping over several window
sizes and removes the arbitrary decision of the choice of the window length.
After the rst version of this paper was submitted, we became aware of independent
work by Hansen and Timmermann (2011). Hansen and Timmermann (2011) propose a
sup-type test similar to ours, although they focus on p-values of the Diebold and Mariano’s
(1995) test statistic estimated via a recursive window estimation procedure for nested models’
comparisons. They provide analytic power calculations for the test statistic. Our approach
is more generally applicable: it can be used for inference on out-of-sample modelsforecast
comparisons and to test forecast optimality where the estimation scheme can be either rolling,
xed or recursive, and the window size can be either a xed fraction of the total sample size
or nite. Also, Hansen and Timmermann (2011) do not consider the ects of time-varying
predictive ability on the power of the test.
We show the usefulness of our methods in an empirical analysis. The analysis re-evaluates
the predictive ability of models of exchange rate determination by verifying the robustness
of the recent empirical evidence in favor of models of exchange rate determination (e.g.,
Molodtsova and Papell, 2009, and Engel, Mark and West, 2007) to the choice of the window
size. Our results reveal that the forecast improvements found in the literature are much
stronger when allowing for a search over several window sizes. As shown by Pesaran and
Timmermann (2005), the choice of the window size depends on the nature of the possible
model instability and the timing of the possible breaks. In particular, a large window is
preferable if the data generating process is stationary but comes at the cost of lower power,
since there are fewer observations in the evaluation window. Similarly, a shorter window may
be more robust to structural breaks, although it may not provide as precise an estimation as
larger windows if the data are stationary. The empirical evidence shows that instabilities are
widespread for exchange rate models (see Rossi, 2006), which might justify why in several
cases we nd improvements in economic modelsforecasting ability relative to the random
walk for small window sizes.
The paper is organized as follows. Section 2 proposes a framework for tests of predictive
ability when the window size is a xed fraction of the total sample size. Section 3 presents
tests of predictive ability when the window size is a xed constant relative to the total sample
4

size. Section 4 shows some Monte Carlo evidence on the performance of our procedures in
small samples, and Section 4 presents the empirical results. Section 5 concludes.
2 Robust Tests of Predictive Accuracy When the Win-
dow Size is Large
Let h 1 denote the (…nite) forecast horizon. We assume that the researcher is interested
in evaluating the performance of hsteps-ahead direct forecasts for the scalar variable y
t+h
using a vector of predictors x
t
using either a rolling, recursive or xed window direct forecast
scheme. We assume that the researcher has P out-of-sample predictions available, where the
rst prediction is made based on an estimate from a sample 1; 2; :::; R, such that the last out-
of-sample prediction is made based on an estimate from a sample of T R+1; :::; R+P 1 = T
where R+P +h1 = T +h is the size of the available sample. The methods proposed in this
pap er can be applied to out-of-sample tests of equal predictive ability, forecast rationality
and unbiasedness.
In order to present the main idea underlying the methods proposed in this paper, let us
focus on the case where researchers are interested in evaluating the forecasting performance of
two competing models: Model 1, involving parameters , and Model 2, involving parameters
. The parameters can be estimated either with a rolling, xed or a recursive window
estimation scheme. In the rolling window forecast method, the true but unknown model’s
parameters
and
are estimated by
b
t;R
and b
t;R
using samples of R observations dated
tR+1; :::; t, for t = R; R+1; :::; T . In the recursive window estimation method, the model’s
parameters are instead estimated using samples of t observations dated 1; :::; t, for t = R;
R + 1; :::; T . In the xed window estimation method, the models parameters are estimated
only once using observations dated 1; :::; R. Let
n
L
(1)
t+h
b
t;R
o
T
t=R
and
n
L
(2)
t+h
b
t;R
o
T
t=R
denote the sequence of loss functions of models 1 and 2 evaluating hsteps-ahead relative
out-of-sample forecast errors, and let
n
L
t+h
b
t;R
; b
t;R
o
T
t=R
denote their di¤erence.
Typically, researchers rely on the Diebold and Mariano (1995), West (1996), McCracken
(2000) or Clark and McCracken’s (2001) test statistics for inference on the forecast error
loss di¤erences. For example, in the case of the Diebold and Mariano’s (1995) and Wests
(1996) test, researchers evaluate the two models using the sample average of the sequence of
5

Citations
More filters
Journal ArticleDOI
TL;DR: Convergence of Probability Measures as mentioned in this paper is a well-known convergence of probability measures. But it does not consider the relationship between probability measures and the probability distribution of probabilities.
Abstract: Convergence of Probability Measures. By P. Billingsley. Chichester, Sussex, Wiley, 1968. xii, 253 p. 9 1/4“. 117s.

5,689 citations

Journal ArticleDOI
TL;DR: This article proposed a new investor sentiment index that is aligned with the purpose of predicting the aggregate stock market by eliminating a common noise component in sentiment proxies, the new index has much greater predictive power than existing sentiment indices have both in and out of sample, and the predictability becomes both statistically and economically significant.
Abstract: We propose a new investor sentiment index that is aligned with the purpose of predicting the aggregate stock market. By eliminating a common noise component in sentiment proxies, the new index has much greater predictive power than existing sentiment indices have both in and out of sample, and the predictability becomes both statistically and economically significant. In addition, it outperforms well-recognized macroeconomic variables and can also predict cross-sectional stock returns sorted by industry, size, value, and momentum. The driving force of the predictive power appears to stem from investors' biased beliefs about future cash flows.

684 citations


Cites background or methods from "Out-of-Sample Forecast Tests Robust..."

  • ...Following Kandel and Stambaugh (1996), Campbell and Thompson (2008) and Ferreira and Santa-Clara (2011), among others, we compute the certainty equivalent return (CER) gain and Sharpe Ratio for a mean-variance investor who optimally allocates across equities and the risk-free asset using the out-of-sample predictive regression forecasts....

    [...]

  • ...At least as early as Keynes (1936), researchers have analyzed whether investor sentiment can affect asset prices due to the well-known psychological fact that people with high (low) sentiment tend to make overly optimistic (pessimistic) judgments and choices....

    [...]

Book ChapterDOI
TL;DR: In this paper, the authors survey the literature on stock return forecasting, highlighting the challenges faced by forecasters as well as strategies for improving return forecasts and illustrate key issues via an empirical application based on updated data.
Abstract: We survey the literature on stock return forecasting, highlighting the challenges faced by forecasters as well as strategies for improving return forecasts. We focus on U.S. equity premium forecastability and illustrate key issues via an empirical application based on updated data. Some studies argue that, despite extensive in-sample evidence of equity premium predictability, popular predictors from the literature fail to outperform the simple historical average benchmark forecast in out-of-sample tests. Recent studies, however, provide improved forecasting strategies that deliver statistically and economically significant out-of-sample gains relative to the historical average benchmark. These strategies – including economically motivated model restrictions, forecast combination, diffusion indices, and regime shifts – improve forecasting performance by addressing the substantial model uncertainty and parameter instability surrounding the data-generating process for stock returns. In addition to the U.S. equity premium, we succinctly survey out-of-sample evidence supporting U.S. cross-sectional and international stock return forecastability. The significant evidence of stock return forecastability worldwide has important implications for the development of both asset pricing models and investment management strategies.

422 citations

Journal ArticleDOI
TL;DR: The authors provides a critical review of the recent literature on exchange rate forecasting and illustrates the new methodologies and fundamentals that have been recently proposed in an up-to-date, thorough empirical analysis.
Abstract: The main goal of this article is to provide an answer to the question: does anything forecast exchange rates, and if so, which variables? It is well known that exchange rate fluctuations are very difficult to predict using economic models, and that a random walk forecasts exchange rates better than any economic model (the Meese and Rogoff puzzle). However, the recent literature has identified a series of fundamentals/ methodologies that claim to have resolved the puzzle. This article provides a critical review of the recent literature on exchange rate forecasting and illustrates the new methodologies and fundamentals that have been recently proposed in an up-to-date, thorough empirical analysis. Overall, our analysis of the literature and the data suggests that the answer to the question: "Are exchange rates predictable?" is, "It depends"?on the choice of predictor, forecast horizon, sample period, model, and forecast evaluation method. Predictability is most apparent when one or more of the following hold: the predictors are Taylor rule or net foreign assets, the model is linear, and a small number of parameters are estimated. The toughest benchmark is the random walk without drift.

376 citations

Journal ArticleDOI
TL;DR: The Diebold–Mariano test was intended for comparing forecasts; it has been, and remains, useful in that regard; much of the large ensuing literature, however, uses -type tests for comparing models, in pseudo-out-of-sample environments.
Abstract: The Diebold–Mariano (DM) test was intended for comparing forecasts; it has been, and remains, useful in that regard. The DM test was not intended for comparing models. Much of the large ensuing literature, however, uses DM-type tests for comparing models, in pseudo-out-of-sample environments. In that case, simpler yet more compelling full-sample model comparison procedures exist; they have been, and should continue to be, widely used. The hunch that pseudo-out-of-sample analysis is somehow the “only,” or “best,” or even necessarily a “good” way to provide insurance against in-sample overfitting in model comparisons proves largely false. On the other hand, pseudo-out-of-sample analysis remains useful for certain tasks, perhaps most notably for providing information about comparative predictive performance during particular historical episodes.

306 citations


Cites methods from "Out-of-Sample Forecast Tests Robust..."

  • ...Then pseudo-out-of-sample methods actually expand the scope for data mining in finite samples, as emphasized by Rossi and Inoue (2012) and Hansen and Timmermann (2011), because one can then also mine over t∗....

    [...]

References
More filters
ReportDOI
TL;DR: In this article, a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction is described.
Abstract: This paper describes a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction. It also establishes consistency of the estimated covariance matrix under fairly general conditions.

18,117 citations


Additional excerpts

  • ...−1 T∑ t=R Ldt+h × (θ̂t,R, γ̂t,R) Ldt+h−i(θ̂t−i,R, γ̂t−i,R), (6) where Ldt+h(θ̂t,R, γ̂t,R) ≡ Lt+h(θ̂t,R, γ̂t,R) − P −1 ∑T t=R Lt+h(θ̂t,R, γ̂t,R) and q(P ) is a bandwidth that grows with P (e.g., Newey and West 1987)....

    [...]

Book
01 Jan 1968
TL;DR: Weak Convergence in Metric Spaces as discussed by the authors is one of the most common modes of convergence in metric spaces, and it can be seen as a form of weak convergence in metric space.
Abstract: Weak Convergence in Metric Spaces. The Space C. The Space D. Dependent Variables. Other Modes of Convergence. Appendix. Some Notes on the Problems. Bibliographical Notes. Bibliography. Index.

13,153 citations

Posted Content
TL;DR: The authors describes the advantages of these studies and suggests how they can be improved and also provides aids in judging the validity of inferences they draw, such as multiple treatment and comparison groups and multiple pre- or post-intervention observations.
Abstract: Using research designs patterned after randomized experiments, many recent economic studies examine outcome measures for treatment groups and comparison groups that are not randomly assigned. By using variation in explanatory variables generated by changes in state laws, government draft mechanisms, or other means, these studies obtain variation that is readily examined and is plausibly exogenous. This paper describes the advantages of these studies and suggests how they can be improved. It also provides aids in judging the validity of inferences they draw. Design complications such as multiple treatment and comparison groups and multiple pre- or post-intervention observations are advocated.

7,222 citations

Journal ArticleDOI
TL;DR: Convergence of Probability Measures as mentioned in this paper is a well-known convergence of probability measures. But it does not consider the relationship between probability measures and the probability distribution of probabilities.
Abstract: Convergence of Probability Measures. By P. Billingsley. Chichester, Sussex, Wiley, 1968. xii, 253 p. 9 1/4“. 117s.

5,689 citations


"Out-of-Sample Forecast Tests Robust..." refers background or methods in this paper

  • ...Under the null hypothesis that 2 = [ 0 1 0 0]0, the MSPE-adjusted of Clark and West (2007) can be written as:...

    [...]

  • ...Suppose that 2 = [ 0 1 0 0]0 and that x2;t is strictly exogenous....

    [...]

ReportDOI
TL;DR: In this article, explicit tests of the null hypothesis of no difference in the accuracy of two competing forecasts are proposed and evaluated, and asymptotic and exact finite-sample tests are proposed, evaluated and illustrated.
Abstract: We propose and evaluate explicit tests of the null hypothesis of no difference in the accuracy of two competing forecasts. In contrast to previously developed tests, a wide variety of accuracy measures can be used (in particular, the loss function need not be quadratic and need not even be symmetric), and forecast errors can be non-Gaussian, nonzero mean, serially correlated, and contemporaneously correlated. Asymptotic and exact finite-sample tests are proposed, evaluated, and illustrated.

5,628 citations

Frequently Asked Questions (11)
Q1. What contributions have the authors mentioned in the paper "Out-of-sample forecast tests robust to the choice of window size" ?

This paper proposes new methodologies for evaluating out-of-sample forecasting performance that are robust to the choice of the estimation window size. The authors show that the tests proposed in the literature may lack the power to detect predictive ability and might be subject to data snooping across di¤erent window sizes if used repeatedly. 

The methods proposed in this paper can be applied to out-of-sample tests of equal predictive ability, forecast rationality and unbiasedness. 

To avoid data snooping over the choices of and , the authors recommend researchers to impose symmetry by xing = 1 , and to use = [0:15] in practice. 

The novelty of the methodologies that the authors propose is that they are robust to the choice of the estimation and evaluation window size. 

The framework allows for linear and non-linear models estimated by any extremum estimator (e.g. OLS, GMM and MLE), the data to have serial correlation and heteroskedasticity as long as stationary is satis ed (which rules out unit roots and structural breaks), and forecast errors (which can be either one period or multi-period) evaluated using continuously di¤erentiable loss functions, such as MSE. 

Assumption (a) is necessary for t+h(R) to have zero mean and is satis ed under the assumption discussed by Clark and West (x1t is not null) or under the assumption that x2t is strictly exogenous. 

The asymptotic normality result does not hinge on whether or not two models are nested but rather on whether or not the disturbance terms of the two models are numerically identical in population under the null hypothesis. 

West and McCracken (1998) have shown that it is very important to allow for a general variance estimator that takes into account estimation uncertainty and/or correcting the statistics by the necessary adjustments. 

This suggests that the empirical evidence in favor of predictive ability may be driven by the existence of instabilities in the predictive ability, for which rolling windows of small size are advantageous. 

Before the authors get into details, a word of caution: their setup requires strict exogeneity of the regressors, which is a very strong assumption in time series application. 

The evidence highlights the sharp sensitivity of power of all the tests to the timing of the break relative to the forecast evaluation window, and shows that, in the presence of instabilities, their proposed tests tend to be more powerful than some of the tests based on an ad-hoc window size, whose power properties crucially depend on the window size.