scispace - formally typeset
Search or ask a question
Journal ArticleDOI

MIDAS regressions: Further results and new directions

15 Feb 2007-Econometric Reviews (Taylor & Francis Group)-Vol. 26, Iss: 1, pp 53-90
TL;DR: The authors explore mixed data sampling (henceforth MIDAS) regression models, which involve time series data sampled at different frequencies, and provide empirical evidence on microstructure noise and volatility forecasting.
Abstract: We explore mixed data sampling (henceforth MIDAS) regression models The regressions involve time series data sampled at different frequencies Volatility and related processes are our prime focus, though the regression method has wider applications in macroeconomics and finance, among other areas The regressions combine recent developments regarding estimation of volatility and a not-so-recent literature on distributed lag models We study various lag structures to parameterize parsimoniously the regressions and relate them to existing models We also propose several new extensions of the MIDAS framework The paper concludes with an empirical section where we provide further evidence and new results on the risk–return trade-off We also report empirical evidence on microstructure noise and volatility forecasting

Content maybe subject to copyright    Report

MIDAS Regressions:
Further Results and New Directions
Eric Ghysels
Arthur Sinko
Rossen Valkanov
§
First Draft: February 2002
This Draft: February 7, 2006
Abstract
We explore Mixed Data Sampling (henceforth MIDAS) regression models. The regressions
involve time series data sampled at different frequencies. Volatility and related processes are
our prime focus, though the regression method has wider applications in macroeconomics and
finance, among other areas. The regressions combine recent developments regarding estimation
of volatility and a not so recent literature on distributed lag models. We study various lag
structures to parameterize parsimoniously the regressions and relate them to existing models.
We also propose several new extensions of the MIDAS framework. The paper concludes with an
empirical section where we provide further evidence and new results on the risk-return tradeoff.
We also report empirical evidence on microstructure noise and volatility forecasting.
We thank two Referees and an Associate Editor, Alberto Plazzi, Pedro Santa-Clara as well as seminar
participants at City University of Hong Kong, Emory University, the Federal Reserve Board, ITAM, Korea
University, New York University, Oxford University, Tsinghua University, University of Iowa, UNC, USC,
participants at the Symposium on New Frontiers in Financial Volatility Modelling, Florence, the Academia
Sinica Conference on Analysis of High-Frequency Financial Data and Market Microstructure, Taipei, the
CIREQ-CIRANO-MITACS conference on Financial Econometrics, Montreal and the Research Triangle
Conference, for helpful comments. All remaining errors are our own.
Department of Finance, Kenan-Flagler School of Business and Department of Economics University of
North Carolina, Gardner Hall CB 3305,Gardner Hall CB 3305, Chapel Hill, NC 27599-3305, phone: (919)
966-5325, e-mail: eghysels@unc.edu.
Department of Economics, University of North Carolina, Gardner Hall CB 3305, Chapel Hill, NC 27599-
3305, e-mail: sinko@email.unc.edu
§
Rady School of Management, UCSD, Pepper Canyon Hall, 3rd Floor 9500 Gilman Drive, MC 0093 La
Jolla, CA 92093-0093, phone: (858) 534-0898, e-mail: rvalkanov@ucsd.edu.

1 Introduction
The availability of data sampled at different frequency always presents a dilemma for a
researcher working with time series data. On the one hand, the variables that are available
at high frequency contain potentially valuable information. On the other hand, the researcher
cannot use this high frequency information directly if some of the variables are available at
a lower frequency, because most time series regressions involve data sampled at the same
interval. The common solution in such cases is to “pre-filter” the data so that the left-hand
and right-hand side variables are available at the same frequency. In the process, a lot of
potentially useful information might be discarded, thus rendering the relation between the
variables difficult to detect.
1
As an alternative, Ghysels, Santa-Clara, and Valkanov (2002),
(2004) and (2005) have recently proposed regressions that directly accommodate variables
sampled at different frequencies. Their MIxed Data Sampling or MIDAS regressions
represent a simple, parsimonious, and flexible class of time series models that allow the
left-hand and right-hand side variables of time series regressions to be sampled at different
frequencies.
Since MIDAS regressions have only recently been introduced, there are a lot of unexplored
questions. The goal of this paper is to explore some of the most pressing issues, to lay
out some new ideas about mixed-frequency regressions, and to present some new empirical
results. Before we start, it is useful to introduce a simple MIDAS regression. Suppose that
a variable y
t
is available once between t 1 and t (say, monthly), another variable x
(m)
t
is
observed m times in the same period (say, daily or m = 22), and that we are interested in
the dynamic relation between y
t
and x
(m)
t
. In other words, we want to project the left-hand
side variable y
t
onto a history of lagged observations of x
(m)
tj/m
. The superscript on x
(m)
tj/m
denotes the higher sampling frequency and its exact timing lag is expressed as a fraction of
the unit interval between t 1 and t. A simple MIDAS model is
y
t
= β
0
+ β
1
B(L
1/m
; θ)x
(m)
t
+ ε
(m)
t
(1)
for t = 1, . . . , T and where B(L
1/m
; θ) =
P
K
k=0
B(k; θ)L
k/m
and L
1/m
is a lag operator such
that L
1/m
x
(m)
t
= x
(m)
t1/m
, and the lag coefficients in B(k; θ) of the corresponding lag operator
L
k/m
are parameterized as a function of a small-dimensional vector of parameters θ.
1
This situation is becoming more frequent now as dramatic improvements in information gathering have
produced new, high-frequency datasets, particularly in the area of financial econometrics.
1

In the mixed-frequency framework (1), the number of lags of x
(m)
t
is likely to be significant.
For instance, if monthly observations of y
t
is affected by six months’ worth of lagged daily
x
(m)
t
’s, we would need 132 lags (K = 132) of high-frequency lagged variables. If the
parameters of the lagged polynomial are left unrestricted (or B(k) does not depend on
θ), then there would be a lot of parameters to estimate. As a way of addressing parameter
proliferation, in a MIDAS regression the coefficients of the polynomial in L
1/m
are captured
by a known function B(L
1/m
; θ) of a few parameters summarized in a vector θ. We will
discuss several alternative specifications of B(L
1/m
; θ) in the paper. Finally, the parameter
β
1
captures the overall impact of lagged x
(m)
t
’s on y
t
. We identify β
1
by normalizing the
function B(L
1/m
; θ) to sum up to unity. While the normalization and the identification of β
1
are not strictly necessary in a MIDAS regression, they will be very useful for our applications
later in the paper.
In some specific cases, the results from the MIDAS regressions can be obtained using high-
frequency regressions alone. We work out one such example in the context of volatility
forecasting. While we are able to derive an explicit relation between the MIDAS parameters
and the purely high-frequency model, the relation is already quite complicated in this simple
case. For more interesting applications, such as these we conduct later in the paper, such
a relation is difficult to derive. This finding illustrates another advantage of our approach:
the MIDAS specification captures a very rich dynamic of the high-frequency process in a
very simple and parsimonious fashion. The MIDAS models benefit from several strands
of econometric models. The parameterization of the polynomial is similar in spirit to the
distributed lag models (see e.g. Griliches (1967), Dhrymes (1971) and Sims (1974) for surveys
on distributed lag models). Mixed data sampling regression models share some features
with distributed lag models but also have unique features. For instance, while we use a
parameterization of B(k; θ) that is common in distributed lag models, we also introduce a
new one called beta polynomial and that appears well suited in the applications that we
consider. We also discuss MIDAS regressions with stepfunctions introduced in Forsberg and
Ghysels (2004). Their appeal is the use of OLS estimation methods, but this comes at a
cost, namely that parsimony may not be preserved.
A convenient parametric function of B(L
1/m
; θ) also allows us to directly deal with lag
selection. In an unrestricted case, we have to design a lag selection procedure which can be
particularly difficult in this setup, where we will have to make the choice whether to include,
say, 66 or 67 daily lags in forecasting of a monthly observation y
t
. The parameterizations
2

of B(L
1/m
; θ) that we propose are quite flexible. For different value of θ, they can take
various shapes. In particular, the parameterized weights can decrease at different rates
as the number of lags increases. Therefore, by estimating θ, we effectively allow the data
to select the number of lags that are needed in the mixed-data relation between y
t
and x
t
.
Hence, once we choose the appropriate functional form of B(L
1/m
; θ), the lag length selection
in MIDAS is purely data-driven.
Variations of the MIDAS regression (1) have been used by Ghysels, Santa-Clara, and
Valkanov (2002), Ghysels, Santa-Clara, and Valkanov (2003). More complex specifications
are certainly possible and, in this paper, we propose several natural extensions of the basic
MIDAS regressions. First, on the right-hand side we can include variables sampled at various
frequencies. Second, non-linearities are easy to introduce as demonstrated by Ghysels, Santa-
Clara, and Valkanov (2005) who use one such model. In this paper, we discuss more general
non-linear MIDAS regressions. Third, MIDAS can accommodate tick-by-tick data that are
observed at unequally spaced intervals. Finally, multivariate MIDAS regressions are also
possible. All of these models are new and still unexplored. Some of them present unique
challenges, others are straightforward to estimate.
We revisit two empirical applications that related to prior studies, (1) the risk-return trade-
off and (2) volatility prediction. Regarding the risk-return trade-off, we present a variation
of the results in Ghysels, Santa-Clara, and Valkanov (2005) and Ghysels, Santa-Clara, and
Valkanov (2003). The first paper uses a MIDAS regression to show that there is a positive
relation between market volatility and return. Expected returns are proxied using monthly
averages while the variance is estimated using daily squared returns over the last year. The
second paper shows that while squared daily returns are good forecasts of future monthly
variances, there are predictors that clearly dominate. Here, we combine the insights from
both papers. First, we look at the risk-return relation at different frequencies, one, two,
three, and four weeks. Second, we use a different polynomial specification from the one
used in Ghysels, Santa-Clara, and Valkanov (2005).
2
Third, we use several predictors that
Ghysels, Santa-Clara, and Valkanov (2003) show are good at forecasting future volatility
in a MIDAS context. Finally, we use a different dataset from Ghysels, Santa-Clara, and
Valkanov (2005).
2
For further evidence on the risk-return trade-off using MIDAS, see e.g.
´
Angel, Nave, and Rubio (2004),
Wang (2004) and Charoenrook and Conrad (2005). Models of idiosyncratic volatility using MIDAS appear
in e.g. Jiang and Lee (2004) and Brown and Ferreira (2004).
3

We find that there is a robustly positive and statistically significant risk-return tradeoff
across horizons and across predictors. Remarkably, the tradeoff is significant even for weekly
returns, even though they are noisy proxies of expected returns. However, the relation is
clearer at the two to four week horizon. Surprisingly, we find that variables that are better
at predicting the variance do not necessarily produce better forecasts of expected returns or
better estimates of the risk-return tradeoff. Hence, they must be capturing a component of
the variance that is not priced by the market and consequently that is unrelated to expected
returns.
We also include empirical evidence on the impact of microstructure noise on volatility
prediction. While using high frequency data has some clear advantages, there are some
costs. High frequency sampling may be plagued by microstructure noise. Several papers
have tried to shed light on this: A¨ıt-Sahalia, Mykland, and Zhang (2005), Bandi and Russell
(2005b), Bandi and Russell (2005a), Hansen and Lunde (2004), Zhang, Mykland, and A¨ıt-
Sahalia (2005a), among others have suggested corrections for microstructure noise. We assess
how much these corrections improve forecasting.
The paper is structured as follows. Section two discusses various polynomial specifications.
Section three shows that the MIDAS framework is very flexible and captures a rich set
of dynamics that would be difficult to obtain using standard same-frequency regressions.
Section four presents various extensions of MIDAS models, such as a generalized MIDAS
regression, non-linear MIDAS regressions, tick-by-tick MIDAS regressions, and multivariate
MIDAS. In section five, we apply some of the generalizations to estimate the relation between
conditional expected return and risk using ten years of daily Dow Jones index return data.
Some of our results confirm previous findings, others are quite surprising and offer new
directions for research. In section six, we offer concluding remarks.
2 Polynomial Specifications
The parameterization of the lagged coefficients of B(k; θ) in a parsimonious fashion is one
of the key MIDAS features. In this section, we discuss various specifications of MIDAS
regression polynomials. A first subsection is devoted to finite polynomials and we discuss
in particular two parameterizations that were used in previous papers and that we will use
in the empirical section of this paper. A second subsection deals with infinite polynomials
4

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, an additive cascade model of volatility components defined over different time periods is proposed, which leads to a simple AR-type model in the realized volatility with the feature of considering different volatility components realized over different horizons and thus termed Heterogeneous Autoregressive model of Realized Volatility (HAR-RV).
Abstract: The paper proposes an additive cascade model of volatility components defined over different time periods. This volatility cascade leads to a simple AR-type model in the realized volatility with the feature of considering different volatility components realized over different time horizons and thus termed Heterogeneous Autoregressive model of Realized Volatility (HAR-RV). In spite of the simplicity of its structure and the absence of true long-memory properties, simulation results show that the HAR-RV model successfully achieves the purpose of reproducing the main empirical features of financial returns (long memory, fat tails, and self-similarity) in a very tractable and parsimonious way. Moreover, empirical results show remarkably good forecasting performance. (JEL: C13, C22, C51, C53)

1,848 citations

Journal ArticleDOI
TL;DR: This paper revisited the relation between stock market volatility and macroeconomic activity using a new class of component models that distinguish short-run from long-run movements and found that macroeconomic fundamentals play a significant role even at short horizons.
Abstract: We revisit the relation between stock market volatility and macroeconomic activity using a new class of component models that distinguish short-run from long-run movements. We formulate models with the long-term component driven by inflation and industrial production growth that are in terms of pseudo out-of-sample prediction for horizons of one quarter at par or outperform more traditional time series volatility models at longer horizons. Hence, imputing economic fundamentals into volatility models pays off in terms of long-horizon forecasting. We also find that macroeconomic fundamentals play a significant role even at short horizons.

696 citations

Journal ArticleDOI
TL;DR: In this paper, the authors consider various mixed data sampling (MIDAS) regressions to predict volatility and find that daily realized power (involving 5-min absolute returns) is the best predictor of future volatility and outperforms models based on realized volatility.

608 citations

Journal ArticleDOI

378 citations

Journal ArticleDOI
TL;DR: In this article, the authors study asset pricing in economies featuring both risk and uncertainty and find stronger empirical evidence for an uncertainty-return trade-off than for the traditional risk return tradeoff.

369 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation, and measures of causal lag and causal strength can then be constructed.
Abstract: There occurs on some occasions a difficulty in deciding the direction of causality between two related variables and also whether or not feedback is occurring. Testable definitions of causality and feedback are proposed and illustrated by use of simple two-variable models. The important problem of apparent instantaneous causality is discussed and it is suggested that the problem often arises due to slowness in recording information or because a sufficiently wide class of possible causal variables has not been used. It can be shown that the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation. Measures of causal lag and causal strength can then be constructed. A generalisation of this result with the partial cross spectrum is suggested.

16,349 citations

Book ChapterDOI
01 Jan 2001
TL;DR: In this article, it is shown that the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation, and measures of causal lag and causal strength can then be constructed.
Abstract: There occurs on some occasions a difficulty in deciding the direction of causality between two related variables and also whether or not feedback is occurring. Testable definitions of causality and feedback are proposed and illustrated by use of simple two-variable models. The important problem of apparent instantaneous causality is discussed and it is suggested that the problem often arises due to slowness in recordhag information or because a sufficiently wide class of possible causal variables has not been used. It can be shown that the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation. Measures of causal lag and causal strength can then be constructed. A generalization of this result with the partial cross spectrum is suggested.The object of this paper is to throw light on the relationships between certain classes of econometric models involving feedback and the functions arising in spectral analysis, particularly the cross spectrum and the partial cross spectrum. Causality and feedback are here defined in an explicit and testable fashion. It is shown that in the two-variable case the feedback mechanism can be broken down into two causal relations and that the cross spectrum can be considered as the sum of two cross spectra, each closely connected with one of the causations. The next three sections of the paper briefly introduce those aspects of spectral methods, model building, and causality which are required later. Section IV presents the results for the two-variable case and Section V generalizes these results for three variables.

11,896 citations


"MIDAS regressions: Further results ..." refers background or methods in this paper

  • ...A kernel-based correction was first introduced by Zhou (1996) and further developed by Hansen and Lunde (2003), Barndorff-Nelsen, Hansen, Lunde, and Shephard (2004) among others....

    [...]

  • ...A kernel-based correction was first introduced by Zhou (1996) and further developed by Hansen and Lunde (2003), Barndorff-Nelsen, Hansen, Lunde, and Shephard (2004) among others. Corrections based on sub-sampling were introduced in Zhou (1996), Zhang, Mykland, and Äıt-Sahalia (2005b) and Zhang (2005)....

    [...]

  • ...It is of particular interest, because the notion of Granger causality, as put forth in Granger (1969), is subject to temporal aggregation error that can disguise causality or actually create spurious causality when a relevant process is omitted.7 While the MIDAS regression framework does not…...

    [...]

  • ...A kernel-based correction was first introduced by Zhou (1996) and further developed by Hansen and Lunde (2003), Barndorff-Nelsen, Hansen, Lunde, and Shephard (2004) among others. Corrections based on sub-sampling were introduced in Zhou (1996), Zhang, Mykland, and Äıt-Sahalia (2005b) and Zhang (2005). Bandi and Russell (2005b) and Bandi and Russell (2005a) studied optimal sampling in the presence of microstructure noise....

    [...]

Journal ArticleDOI
TL;DR: In this article, an exponential ARCH model is proposed to study volatility changes and the risk premium on the CRSP Value-Weighted Market Index from 1962 to 1987, which is an improvement over the widely-used GARCH model.
Abstract: This paper introduces an ARCH model (exponential ARCH) that (1) allows correlation between returns and volatility innovations (an important feature of stock market volatility changes), (2) eliminates the need for inequality constraints on parameters, and (3) allows for a straightforward interpretation of the "persistence" of shocks to volatility. In the above respects, it is an improvement over the widely-used GARCH model. The model is applied to study volatility changes and the risk premium on the CRSP Value-Weighted Market Index from 1962 to 1987. Copyright 1991 by The Econometric Society.

10,019 citations


"MIDAS regressions: Further results ..." refers methods in this paper

  • ...The above specification is very much inspired by the EGARCH model of Nelson ( 1991 )....

    [...]

  • ...One parametric choice for g of interest in the context of volatility is yt+k = 0 + K∑ i=1 L∑ j=1 Bij(L1/mi ) ( r (m)t + L ∣∣r (m)t ∣∣)2 + t+1 (17) The above specification is very much inspired by the EGARCH model of Nelson (1991)....

    [...]

  • ...Using different methods, Campbell (1987) and Nelson (1991) find a significantly negative relation, whereas Glosten et al. (1993), Harvey (2001), and Turner et al. (1989) find both a positive and a negative relation depending on the method used....

    [...]

Journal ArticleDOI
TL;DR: In this article, a modified GARCH-M model was used to find a negative relation between conditional expected monthly return and conditional variance of monthly return, using seasonal patterns in volatility and nominal interest rates to predict conditional variance.
Abstract: We find support for a negative relation between conditional expected monthly return and conditional variance of monthly return, using a GARCH-M model modified by allowing (1) seasonal patterns in volatility, (2) positive and negative innovations to returns having different impacts on conditional volatility, and (3) nominal interest rates to predict conditional variance. Using the modified GARCH-M model, we also show that monthly conditional volatility may not be as persistent as was thought. Positive unanticipated returns appear to result in a downward revision of the conditional volatility whereas negative unanticipated returns result in an upward revision of conditional volatility. THE TRADEOFF BETWEEN RISK and return has long been an important topic in asset valuation research. Most of this research has examined the tradeoff between risk and return among different securities within a given time period. The intertemporal relation between risk and return has been examined by several authors-Fama and Schwert (1977), French, Schwert, and Stambaugh (1987), Harvey (1989), Campbell and Hentschel (1992), Nelson (1991), and Chan, Karolyi, and Stulz (1992), to name a few. This paper extends that research.

7,837 citations


"MIDAS regressions: Further results ..." refers background or methods in this paper

  • ...Considering multivariate MIDAS regressions (18) allows us to address Granger causality issues. It is of particular interest, because the notion of Granger causality, as put forth in Granger (1969), is subject to temporal aggregation error that can disguise causality or actually create spurious causality when a relevant process is omitted....

    [...]

  • ...Considering multivariate MIDAS regressions (18) allows us to address Granger causality issues. It is of particular interest, because the notion of Granger causality, as put forth in Granger (1969), is subject to temporal aggregation error that can disguise causality or actually create spurious causality when a relevant process is omitted.(7) While the MIDAS regression framework does not necessarily resolve all aggregation issues, it might provide a convenient and powerful way of testing for Granger causality. Indeed, in typical VAR models based on same-frequency regressions, Granger causality may be difficult to detect due to temporal aggregation on the right-hand side variables. The restrictions on the polynomials to test for causality are very much the same as those in the regular Granger causality tests. It is also worth noting that MIDAS regression polynomials, univariate or multivariate, can be two-sided, i.e., they can involve future realizations of x. This allows us to conduct Granger causality tests as suggested by Sims (1972). The multivariate specifications include systems of equations that can address ARCH-in-mean (7)There is a considerable literature on the subject....

    [...]

  • ...Using different methods, Campbell (1987) and Nelson (1991) find a significantly negative relation, whereas Glosten et al. (1993), Harvey (2001), and Turner et al. (1989) find both a positive and a negative relation depending on the method used....

    [...]

Journal ArticleDOI
TL;DR: In this article, an intertemporal model for the capital market is deduced from portfolio selection behavior by an arbitrary number of investors who aot so as to maximize the expected utility of lifetime consumption and who can trade continuously in time.
Abstract: An intertemporal model for the capital market is deduced from the portfolio selection behavior by an arbitrary number of investors who aot so as to maximize the expected utility of lifetime consumption and who can trade continuously in time. Explicit demand functions for assets are derived, and it is shown that, unlike the one-period model, current demands are affected by the possibility of uncertain changes in future investment opportunities. After aggregating demands and requiring market clearing, the equilibrium relationships among expected returns are derived, and contrary to the classical capital asset pricing model, expected returns on risky assets may differ from the riskless rate even when they have no systematic or market risk. ONE OF THE MORE important developments in modern capital market theory is the Sharpe-Lintner-Mossin mean-variance equilibrium model of exchange, commonly called the capital asset pricing model.2 Although the model has been the basis for more than one hundred academic papers and has had significant impact on the non-academic financial community,' it is still subject to theoretical and empirical criticism. Because the model assumes that investors choose their portfolios according to the Markowitz [21] mean-variance criterion, it is subject to all the theoretical objections to this criterion, of which there are many.4 It has also been criticized for the additional assumptions required,5 especially homogeneous expectations and the single-period nature of the model. The proponents of the model who agree with the theoretical objections, but who argue that the capital market operates "as if" these assumptions were satisfied, are themselves not beyond criticism. While the model predicts that the expected excess return from holding an asset is proportional to the covariance of its return with the market

6,294 citations

Frequently Asked Questions (7)
Q1. What are the contributions in "Midas regressions: further results and new directions∗" ?

The authors study various lag structures to parameterize parsimoniously the regressions and relate them to existing models. The authors also propose several new extensions of the MIDAS framework. The paper concludes with an empirical section where the authors provide further evidence and new results on the risk-return tradeoff. The authors also report empirical evidence on microstructure noise and volatility forecasting. 

For instance, if monthly observations of yt is affected by six months’ worth of lagged daily x (m) t ’s, the authors would need 132 lags (K = 132) of high-frequency lagged variables. 

The authors report the mean absolute deviation (MAD) as a measure of goodness of fit (fourth column), because it provides more robust results in the presence of heteroskedasticity. 

As a way of addressing parameter proliferation, in a MIDAS regression the coefficients of the polynomial in L1/m are captured by a known function B(L1/m; θ) of a few parameters summarized in a vector θ. 

Santa-Clara, and Valkanov (2003) show that the best overall predictor of conditional volatility is the realized power and that, not surprisingly, better forecasts are obtained at shorter (weekly) horizons. 

If the parameters of the lagged polynomial are left unrestricted (or B(k) does not depend on θ), then there would be a lot of parameters to estimate. 

It is also worth noting that for stochastic volatility models the problem is even more difficult since the volatility factors are latent and therefore need to be extracted from observed past returns.