scispace - formally typeset
Open AccessJournal ArticleDOI

Approximately Normal Tests for Equal Predictive Accuracy in Nested Models

TLDR
In this paper, the mean squared prediction error (MSPE) from the parsimonious model is adjusted to account for the noise in the large model's model. But, the adjustment is based on the nonstandard limiting distributions derived in Clark and McCracken (2001, 2005a) to argue that use of standard normal critical values will yield actual sizes close to, but a little less than, nominal size.
Abstract
Forecast evaluation often compares a parsimonious null model to a larger model that nests the null model. Under the null that the parsimonious model generates the data, the larger model introduces noise into its forecasts by estimating parameters whose population values are zero. We observe that the mean squared prediction error (MSPE) from the parsimonious model is therefore expected to be smaller than that of the larger model. We describe how to adjust MSPEs to account for this noise. We propose applying standard methods (West (1996)) to test whether the adjusted mean squared error difference is zero. We refer to nonstandard limiting distributions derived in Clark and McCracken (2001, 2005a) to argue that use of standard normal critical values will yield actual sizes close to, but a little less than, nominal size. Simulation evidence supports our recommended procedure.

read more

Content maybe subject to copyright    Report

TECHNICAL WORKING PAPER SERIES
APPROXIMATELY NORMAL TEST FOR EQUAL
PREDICTIVE ACCURACY IN NESTED MODELS
Todd E. Clark
Kenneth D. West
Technical Working Paper 326
http://www.nber.org/papers/T0326
NATIONAL BUREAU OF ECONOMIC RESEARCH
1050 Massachusetts Avenue
Cambridge, MA 02138
August 2006
West thanks the National Science Foundation for financial support. We thank Pablo M. Pincheira-Brown,
Philip Hans Franses, Taisuke Nakata, Norm Swanson, participants in a session at the January 2006 meeting
of the Econometric Society and two anonymous referees for helpful comments. The views expressed herein
are solely those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Kansas
City or the Federal Reserve System. The views expressed herein are those of the author(s) and do not
necessarily reflect the views of the National Bureau of Economic Research.
©2006 by Todd E. Clark and Kenneth D. West. All rights reserved. Short sections of text, not to exceed two
paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given
to the source.

Approximately Normal Tests for Equal Predictive Accuracy in Nested Models
Todd E. Clark and Kenneth D. West
NBER Technical Working Paper No. 326
August 2006
JEL No. C22, C53, E17, F37
ABSTRACT
Forecast evaluation often compares a parsimonious null model to a larger model that nests the null
model. Under the null that the parsimonious model generates the data, the larger model introduces
noise into its forecasts by estimating parameters whose population values are zero. We observe that
the mean squared prediction error (MSPE) from the parsimonious model is therefore expected to be
smaller than that of the larger model. We describe how to adjust MSPEs to account for this noise.
We propose applying standard methods (West (1996)) to test whether the adjusted mean squared
error difference is zero. We refer to nonstandard limiting distributions derived in Clark and
McCracken (2001, 2005a) to argue that use of standard normal critical values will yield actual sizes
close to, but a little less than, nominal size. Simulation evidence supports our recommended
procedure.
Todd E. Clark
Research Department
Federal Reserve Bank of Kansas City
Kansas City, MO 64198
todd.e.clark@kc.frb.org
Kenneth D. West
Department of Economics
University of Wisconsin
1180 Observatory Drive
Madison, WI 53706
and NBER
kdwest@wisc.edu

1
1. INTRODUCTION
Forecast evaluation in economics often involves a comparison of a parsimonious null model to a
larger alternative model that nests the parsimonious model. Such comparisons are common in both asset
pricing and macroeconomic applications. In asset pricing applications, the parsimonious benchmark
model usually is one that posits that an expected return is constant. The larger alternative model attempts
to use time varying variables to predict returns. If the asset in question is equities, for example, a possible
predictor is the dividend-price ratio. In macroeconomic applications, the parsimonious model might be a
univariate autoregression for the variable to be predicted. The larger alternative model might be a
bivariate or multivariate vector autoregression (VAR) that includes lags of some variables in addition to
lags of the variable to be predicted. If the variable to be predicted is inflation, for example, the VAR
might be bivariate and include lags of the output gap along with lags of inflation.
Perhaps the most commonly used statistic for comparisons of predictions from nested models is
mean squared prediction error (MSPE).
1
In this paper we explore the behavior of standard normal
inference for MSPE in comparisons of nested models.
Our starting point relates to an observation made in our earlier work (Clark and West (2005)):
under the null that the additional parameters in the alternative model do not help prediction, the MSPE of
the parsimonious model should be smaller than that of the alternative. This is true even though the null
states that with parameters set at their population values, the larger model reduces to the parsimonious
model, implying that the two models have equal MSPE when parameters are set at population values.
The intuition for the smaller MSPE for the parsimonious model is that the parsimonious model gains
efficiency by setting to zero parameters that are zero in population, while the alternative introduces noise
into the forecasting process that will, in finite samples, inflate its MSPE. Our earlier paper (Clark and
West, 2005) assumed that the parsimonious model is a random walk. The present paper allows a general
parametric specification for the parsimonious model. This complicates the asymptotic theory, though in
the end our recommendation for applied researchers is a straightforward generalization of our

2
recommendation in Clark and West (2005).
Specifically, we recommend that the point estimate of the difference between the MSPEs of the
two models be adjusted for the noise associated with the larger model’s forecast. We describe a simple
method to do so. We suggest as well that standard procedures (Diebold and Mariano 1995, West 1996)
be used to compute a standard error for the MSPE difference adjusted for such noise. As in Clark and
West (2005), we call the resulting statistic MSPE-adjusted. As has been standard in the literature on
comparing forecasts from nested models since the initial paper by Ashley et al. (1980), we consider one-
sided tests. The alternative is that the large model has smaller MSPE.
In contrast to the simple Clark and West (2005) environment, under our preferred set of technical
conditions the MSPE-adjusted statistic is not asymptotically normal. But we refer to the quantiles of a
certain non-standard distribution studied in Clark and McCracken (2001, 2005a) to argue that standard
normal critical values will yield actual sizes close to, but a little less than, nominal size, for samples
sufficiently large.
Our simulations show that these quantiles are applicable with samples of size typically available.
We report results from 48 sets of simulations on one step ahead forecasts, with the sets of simulations
varying largely in terms of sample size, but as well in terms of DGP. In all 48 simulations, use of the .10
normal critical value of 1.282 resulted in actual size between .05 and .10. The median size across the 48
sets was about 0.08. Forecasts generated using rolling regressions generally yielded more accurately
sized tests than those using recursive regressions. Comparable results apply when we use the .05 normal
critical value of 1.645: the median size is about .04. These results are consistent with the simulations in
Clark and McCracken (2001, 2005a).
By contrast, standard normal inference for the raw (unadjusted) difference in MSPEs–called
MSPE-normal” in our tables–performed abysmally. For one-step ahead forecasts and nominal .10 tests,
the median size across 48 sets of simulations was less than 0.01, for example. The poor performance is
consistent with the asymptotic theory and simulations in McCracken (2004) and Clark and McCracken

3
(2001, 2005a),
Of course, one might use simulation-based methods to conduct inference on MSPE-adjusted, or,
for that matter, MSPE-normal. One such method would be a bootstrap, applied in forecasting contexts by
Mark (1995), Kilian (1999), Clark and West (2005), and Clark and McCracken (2005a). Our simulations
find that the bootstrap results in a modest improvement relative to MSPE-adjusted, with a median size
across 48 sets of simulation between 0.09 and 0.10. Another simulation method we examine is to
simulate the non–standard limiting distributions of the tests, as in Clark and McCracken (2005a). We find
that such a simulation–based method also results in modest improvements in size relative to
MSPE-adjusted (median size across 48 sets of simulations about 0.11).
Our simulations also examine a certain statistic for nested models proposed by Chao, Corradi and
Swanson (2001) (“CCS”, in our tables).
2
We find CCS performs a little better than does MSPE-adjusted
in terms of size, somewhat more poorly in terms of power. (By construction, size adjusted power is
identical for MSPE-adjusted and for the simulation based methods described in the previous paragraph.)
A not-for-publication appendix reports results for multistep forecasts for a subset of the DGPs reported in
our tables. We find that on balance, the bootstrap performs distinctly better than MSPE-adjusted for
relatively small samples sizes, comparably for medium or larger sample sizes; overall, MSPE-adjusted
performs a little better than CCS, a lot better than MSPE-normal.
We interpret these results as supporting the use of MSPE-adjusted, with standard normal critical
values, in forecast comparisons of nested models. MSPE-adjusted allows inference just about as accurate
as the other tests we investigate, with power that is as good or better, and with ease of interpretation that
empirical researchers find appealing.
Readers uninterested in theoretical or simulation details need only read section 2, which outlines
computation of MSPE-adjusted in what we hope is a self-contained way. Section 3 describes the setup
and computation of point estimates. Section 4 describes the theory underlying inference about
MSPE-adjusted. Section 5 describes construction of test statistics. Section 6 presents simulation results.

Citations
More filters
Journal ArticleDOI

Policy Uncertainty and Corporate Investment

TL;DR: In this paper, a strong negative relationship between firm-level capital investment and the aggregate level of uncertainty associated with future policy and regulatory outcomes is found, and the relation between policy uncertainty and capital investment is not uniform in the cross section, being significantly stronger for firms with a higher degree of investment irreversibility and for firms more dependent on government spending.
Journal ArticleDOI

Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy

TL;DR: In this article, the authors argue that substantial model uncertainty and instability seriously impair the forecasting ability of individual predictive regression models, and they recommend combining individual model forecasts to improve out-of-sample equity premium prediction.
Journal ArticleDOI

Investor sentiment aligned: : A powerful predictor of stock returns

TL;DR: This article proposed a new investor sentiment index that is aligned with the purpose of predicting the aggregate stock market by eliminating a common noise component in sentiment proxies, the new index has much greater predictive power than existing sentiment indices have both in and out of sample, and the predictability becomes both statistically and economically significant.
Journal ArticleDOI

Forecasting the Equity Risk Premium: The Role of Technical Indicators

TL;DR: It is shown that combining information from both technical indicators and macroeconomic variables significantly improves equity risk premium forecasts versus using either type of information alone, and the substantial countercyclical fluctuations in the equity riskPremium appear well captured.
References
More filters
ReportDOI

A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix

Whitney K. Newey, +1 more
- 01 May 1987 - 
TL;DR: In this article, a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction is described.
Posted Content

Comparing Predictive Accuracy

TL;DR: The authors describes the advantages of these studies and suggests how they can be improved and also provides aids in judging the validity of inferences they draw, such as multiple treatment and comparison groups and multiple pre- or post-intervention observations.
Posted Content

A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix

TL;DR: In this article, a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction is described.
ReportDOI

Comparing Predictive Accuracy

TL;DR: In this article, explicit tests of the null hypothesis of no difference in the accuracy of two competing forecasts are proposed and evaluated, and asymptotic and exact finite-sample tests are proposed, evaluated and illustrated.
Journal ArticleDOI

Dividend yields and expected stock returns

TL;DR: In this article, the power of dividend yields to forecast stock returns, measured by regression R2, increases with the return horizon, and the authors offer a two-part explanation: high autocorrelation causes the variance of expected returns to grow faster than the return-horizon.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What have the authors contributed in "Technical working paper series approximately normal test for equal predictive accuracy in nested models" ?

Under the null that the parsimonious model generates the data, the larger model introduces noise into its forecasts by estimating parameters whose population values are zero. The authors describe how to adjust MSPEs to account for this noise. The authors propose applying standard methods ( West ( 1996 ) ) to test whether the adjusted mean squared error difference is zero. 

Perhaps the most commonly used statistic for comparisons of predictions from nested models is mean squared prediction error (MSPE).1 

The occasional oversizing Clark and McCracken (2001, 2005a) find arises when data-determined lag selection yields significantly misspecified null forecasting models. 

The results for their adjusted MSPE test highlight the potential for noise associated with theadditional parameters of the alternative model to create an upward shift in the model’s MSPE large enough that the null model has a lower MSPE even when the alternative model is true. 

In panel B, the predictand yt+1 has conditional heteroskedasticity of the form given in equation (6.3), in which the conditional variance at t is a function of z2t-1. 

The authors are about to argue that in nested models, conventional standard errors yield an asymptotic normal approximation that is accurate for practical purposes. 

Across simulations, the implied mean value of the squared difference in fitted values P-13 tT=R( ^y1t+1^y2t+1) 2 is 0.25 (=0.01-(-0.24)). 

In the notation of (3.1) and (3.2), the null and sample moment used to test the null are:Ee1tZtN=0, (5.2) P-13 tT=R ^e1t+1Zt+1N. (CCS) (5.3)The chi-squared test statistic associated with (5.3) was adjusted for uncertainty due to estimation of regression parameters as described in Chao et al. (2001). 

The approximation that the authors have just discussed, which holds R fixed as P goes to infinity, therebyimplying R/P goes to 0, may not be obviously appealing. 

The authors find that on balance, the bootstrap performs distinctly better than MSPE-adjusted for relatively small samples sizes, comparably for medium or larger sample sizes; overall, MSPE-adjusted performs a little better than CCS, a lot better than MSPE-normal. 

The larger alternative model might be a bivariate or multivariate vector autoregression (VAR) that includes lags of some variables in addition to lags of the variable to be predicted.