What is the common reason for the oversizing of Clark and McCracken?

The occasional oversizing Clark and McCracken (2001, 2005a) find arises when data-determined lag selection yields significantly misspecified null forecasting models.

What is the effect of the adjusted MSPE test on the null model?

The results for their adjusted MSPE test highlight the potential for noise associated with theadditional parameters of the alternative model to create an upward shift in the model’s MSPE large enough that the null model has a lower MSPE even when the alternative model is true.

What is the conditional variance of the predictand yt+1?

In panel B, the predictand yt+1 has conditional heteroskedasticity of the form given in equation (6.3), in which the conditional variance at t is a function of z2t-1.

What is the argument about the standard error in nested models?

The authors are about to argue that in nested models, conventional standard errors yield an asymptotic normal approximation that is accurate for practical purposes.

What is the mean value of the squared difference in fitted values?

Across simulations, the implied mean value of the squared difference in fitted values P-13 tT=R( ^y1t+1^y2t+1) 2 is 0.25 (=0.01-(-0.24)).

What is the chi-squared test statistic associated with (5.3)?

In the notation of (3.1) and (3.2), the null and sample moment used to test the null are:Ee1tZtN=0, (5.2) P-13 tT=R ^e1t+1Zt+1N. (CCS) (5.3)The chi-squared test statistic associated with (5.3) was adjusted for uncertainty due to estimation of regression parameters as described in Chao et al. (2001).

What is the approximation of R fixed?

The approximation that the authors have just discussed, which holds R fixed as P goes to infinity, therebyimplying R/P goes to 0, may not be obviously appealing.

(Open Access) Approximately Normal Tests for Equal Predictive Accuracy in Nested Models (2005) | Todd E. Clark

Q: What is the way to compare the bootstrap with MSPE?

The authors find that on balance, the bootstrap performs distinctly better than MSPE-adjusted for relatively small samples sizes, comparably for medium or larger sample sizes; overall, MSPE-adjusted performs a little better than CCS, a lot better than MSPE-normal.

TECHNICAL WORKING PAPER SERIES

APPROXIMATELY NORMAL TEST FOR EQUAL

PREDICTIVE ACCURACY IN NESTED MODELS

Todd E. Clark

Kenneth D. West

Technical Working Paper 326

http://www.nber.org/papers/T0326

NATIONAL BUREAU OF ECONOMIC RESEARCH

1050 Massachusetts Avenue

Cambridge, MA 02138

August 2006

West thanks the National Science Foundation for financial support. We thank Pablo M. Pincheira-Brown,

Philip Hans Franses, Taisuke Nakata, Norm Swanson, participants in a session at the January 2006 meeting

of the Econometric Society and two anonymous referees for helpful comments. The views expressed herein

are solely those of the authors and do not necessarily reflect the views of the Federal Reserve Bank of Kansas

City or the Federal Reserve System. The views expressed herein are those of the author(s) and do not

necessarily reflect the views of the National Bureau of Economic Research.

paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given

to the source.

Approximately Normal Tests for Equal Predictive Accuracy in Nested Models

Todd E. Clark and Kenneth D. West

NBER Technical Working Paper No. 326

August 2006

JEL No. C22, C53, E17, F37

ABSTRACT

Forecast evaluation often compares a parsimonious null model to a larger model that nests the null

model. Under the null that the parsimonious model generates the data, the larger model introduces

noise into its forecasts by estimating parameters whose population values are zero. We observe that

the mean squared prediction error (MSPE) from the parsimonious model is therefore expected to be

smaller than that of the larger model. We describe how to adjust MSPEs to account for this noise.

We propose applying standard methods (West (1996)) to test whether the adjusted mean squared

error difference is zero. We refer to nonstandard limiting distributions derived in Clark and

McCracken (2001, 2005a) to argue that use of standard normal critical values will yield actual sizes

close to, but a little less than, nominal size. Simulation evidence supports our recommended

procedure.

Todd E. Clark

Research Department

Federal Reserve Bank of Kansas City

Kansas City, MO 64198

todd.e.clark@kc.frb.org

Kenneth D. West

Department of Economics

University of Wisconsin

1180 Observatory Drive

Madison, WI 53706

and NBER

kdwest@wisc.edu

1. INTRODUCTION

Forecast evaluation in economics often involves a comparison of a parsimonious null model to a

larger alternative model that nests the parsimonious model. Such comparisons are common in both asset

pricing and macroeconomic applications. In asset pricing applications, the parsimonious benchmark

model usually is one that posits that an expected return is constant. The larger alternative model attempts

to use time varying variables to predict returns. If the asset in question is equities, for example, a possible

predictor is the dividend-price ratio. In macroeconomic applications, the parsimonious model might be a

univariate autoregression for the variable to be predicted. The larger alternative model might be a

bivariate or multivariate vector autoregression (VAR) that includes lags of some variables in addition to

lags of the variable to be predicted. If the variable to be predicted is inflation, for example, the VAR

might be bivariate and include lags of the output gap along with lags of inflation.

Perhaps the most commonly used statistic for comparisons of predictions from nested models is

mean squared prediction error (MSPE).

In this paper we explore the behavior of standard normal

inference for MSPE in comparisons of nested models.

Our starting point relates to an observation made in our earlier work (Clark and West (2005)):

under the null that the additional parameters in the alternative model do not help prediction, the MSPE of

the parsimonious model should be smaller than that of the alternative. This is true even though the null

states that with parameters set at their population values, the larger model reduces to the parsimonious

model, implying that the two models have equal MSPE when parameters are set at population values.

The intuition for the smaller MSPE for the parsimonious model is that the parsimonious model gains

efficiency by setting to zero parameters that are zero in population, while the alternative introduces noise

into the forecasting process that will, in finite samples, inflate its MSPE. Our earlier paper (Clark and

West, 2005) assumed that the parsimonious model is a random walk. The present paper allows a general

parametric specification for the parsimonious model. This complicates the asymptotic theory, though in

the end our recommendation for applied researchers is a straightforward generalization of our

recommendation in Clark and West (2005).

Specifically, we recommend that the point estimate of the difference between the MSPEs of the

two models be adjusted for the noise associated with the larger model’s forecast. We describe a simple

method to do so. We suggest as well that standard procedures (Diebold and Mariano 1995, West 1996)

be used to compute a standard error for the MSPE difference adjusted for such noise. As in Clark and

West (2005), we call the resulting statistic MSPE-adjusted. As has been standard in the literature on

comparing forecasts from nested models since the initial paper by Ashley et al. (1980), we consider one-

sided tests. The alternative is that the large model has smaller MSPE.

In contrast to the simple Clark and West (2005) environment, under our preferred set of technical

conditions the MSPE-adjusted statistic is not asymptotically normal. But we refer to the quantiles of a

certain non-standard distribution studied in Clark and McCracken (2001, 2005a) to argue that standard

normal critical values will yield actual sizes close to, but a little less than, nominal size, for samples

sufficiently large.

Our simulations show that these quantiles are applicable with samples of size typically available.

We report results from 48 sets of simulations on one step ahead forecasts, with the sets of simulations

varying largely in terms of sample size, but as well in terms of DGP. In all 48 simulations, use of the .10

normal critical value of 1.282 resulted in actual size between .05 and .10. The median size across the 48

sets was about 0.08. Forecasts generated using rolling regressions generally yielded more accurately

sized tests than those using recursive regressions. Comparable results apply when we use the .05 normal

critical value of 1.645: the median size is about .04. These results are consistent with the simulations in

Clark and McCracken (2001, 2005a).

By contrast, standard normal inference for the raw (unadjusted) difference in MSPEs–called

“MSPE-normal” in our tables–performed abysmally. For one-step ahead forecasts and nominal .10 tests,

the median size across 48 sets of simulations was less than 0.01, for example. The poor performance is

consistent with the asymptotic theory and simulations in McCracken (2004) and Clark and McCracken

(2001, 2005a),

Of course, one might use simulation-based methods to conduct inference on MSPE-adjusted, or,

for that matter, MSPE-normal. One such method would be a bootstrap, applied in forecasting contexts by

Mark (1995), Kilian (1999), Clark and West (2005), and Clark and McCracken (2005a). Our simulations

find that the bootstrap results in a modest improvement relative to MSPE-adjusted, with a median size

across 48 sets of simulation between 0.09 and 0.10. Another simulation method we examine is to

simulate the non–standard limiting distributions of the tests, as in Clark and McCracken (2005a). We find

that such a simulation–based method also results in modest improvements in size relative to

MSPE-adjusted (median size across 48 sets of simulations about 0.11).

Our simulations also examine a certain statistic for nested models proposed by Chao, Corradi and

Swanson (2001) (“CCS”, in our tables).

We find CCS performs a little better than does MSPE-adjusted

in terms of size, somewhat more poorly in terms of power. (By construction, size adjusted power is

identical for MSPE-adjusted and for the simulation based methods described in the previous paragraph.)

A not-for-publication appendix reports results for multistep forecasts for a subset of the DGPs reported in

our tables. We find that on balance, the bootstrap performs distinctly better than MSPE-adjusted for

relatively small samples sizes, comparably for medium or larger sample sizes; overall, MSPE-adjusted

performs a little better than CCS, a lot better than MSPE-normal.

We interpret these results as supporting the use of MSPE-adjusted, with standard normal critical

values, in forecast comparisons of nested models. MSPE-adjusted allows inference just about as accurate

as the other tests we investigate, with power that is as good or better, and with ease of interpretation that

empirical researchers find appealing.

Readers uninterested in theoretical or simulation details need only read section 2, which outlines

computation of MSPE-adjusted in what we hope is a self-contained way. Section 3 describes the setup

and computation of point estimates. Section 4 describes the theory underlying inference about

MSPE-adjusted. Section 5 describes construction of test statistics. Section 6 presents simulation results.

Approximately Normal Tests for Equal Predictive Accuracy in Nested Models

Figures

Citations

Policy Uncertainty and Corporate Investment

Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy

A Very Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix

Investor sentiment aligned: : A powerful predictor of stock returns

Forecasting the Equity Risk Premium: The Role of Technical Indicators

References

A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix

Comparing Predictive Accuracy

A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelationconsistent Covariance Matrix

Comparing Predictive Accuracy

Dividend yields and expected stock returns

Related Papers (5)

Comparing Predictive Accuracy

Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy

A Comprehensive Look at the Empirical Performance of Equity Premium Prediction

Tests of Equal Forecast Accuracy and Encompassing for Nested Models

Stock Returns and the Term Structure

Frequently Asked Questions (11)

Q1. What have the authors contributed in "Technical working paper series approximately normal test for equal predictive accuracy in nested models" ?

Q2. What is the commonly used statistic for comparisons of predictions from nested models?

Q3. What is the common reason for the oversizing of Clark and McCracken?

Q4. What is the effect of the adjusted MSPE test on the null model?

Q5. What is the conditional variance of the predictand yt+1?

Q6. What is the argument about the standard error in nested models?

Q7. What is the mean value of the squared difference in fitted values?

Q8. What is the chi-squared test statistic associated with (5.3)?

Q9. What is the approximation of R fixed?

Q10. What is the way to compare the bootstrap with MSPE?

Q11. What is the way to compare a parsimonious model to a larger alternative?