scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Heteroskedasticity Autocorrelation Robust Inference in Time Series Regressions with Missing Data

01 Jun 2019-Econometric Theory (Cambridge University Press)-Vol. 35, Iss: 03, pp 601-629
TL;DR: In this paper, the authors investigate the properties of HAC robust test statistics in stationary weakly dependent time series regression settings when observations are missing and obtain fixed-b asymptotic results.
Abstract: In this paper we investigate the properties of heteroskedasticity autocorrelation (HAC) robust test statistics in stationary weakly dependent time series regression settings when observations are missing. We focus on statistics constructed using nonparametric kernel HAC estimators and we obtain fixed-b asymptotic results. We characterize the time series with missing observations as amplitude modulated series following Parzen (1963). For estimation and inference this amounts to plugging in zeros for missing observations. We also investigate an alternative approach where the missing observations are simply ignored. There are three main theoretical findings. First, when the missing process is random and satisfies strong mixing conditions, HAC robust t and Wald statistics computed from the amplitude modulated series follow the usual fixed-b limits as in Kiefer and Vogelsang (2005). Second, when the missing process is non-random, the fixed-b limits depend on the locations of missing observations but are otherwise pivotal. Third, when missing observations are ignored, we obtain the surprising result that fixed-b limits of the robust t and Wald statistics have the standard fixed-b limits whether the missing process is random or non-random. We discuss methods for obtaining fixed-b critical values with a focus on bootstrap methods. We find that the naive i.i.d. bootstrap is an effective and practical way to obtain the fixed-b critical values especially when the bootstrap conditions on the locations of the missing data.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the performance and limitations of smartphones in collecting real-world data (RWD) outside clinic settings require a clear understanding of appropriate methods for data collection, quality assessment, analysis and interpretation.
Abstract: Remote health assessments that gather real-world data (RWD) outside clinic settings require a clear understanding of appropriate methods for data collection, quality assessment, analysis and interpretation. Here we examine the performance and limitations of smartphones in collecting RWD in the remote mPower observational study of Parkinson’s disease (PD). Within the first 6 months of study commencement, 960 participants had enrolled and performed at least five self-administered active PD symptom assessments (speeded tapping, gait/balance, phonation or memory). Task performance, especially speeded tapping, was predictive of self-reported PD status (area under the receiver operating characteristic curve (AUC) = 0.8) and correlated with in-clinic evaluation of disease severity (r = 0.71; P < 1.8 × 10−6) when compared with motor Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS). Although remote assessment requires careful consideration for accurate interpretation of RWD, our results support the use of smartphones and wearables in objective and personalized disease assessments. Smartphone sensors that monitor disease symptoms enable remote assessment of Parkinson’s patients.

38 citations

Posted Content
TL;DR: It is shown how to use conditional independence relations in the data in order to determine if a difference in performance between activity tasks performed before and after the participant has taken medication, are potentially due to an effect of the medication or to a "time-of-the-day" effect.
Abstract: In this work we provide a couple of contributions to the analysis of longitudinal data collected by smartphones in mobile health applications. First, we propose a novel statistical approach to disentangle personalized treatment and "time-of-the-day" effects in observational studies. Under the assumption of no unmeasured confounders, we show how to use conditional independence relations in the data in order to determine if a difference in performance between activity tasks performed before and after the participant has taken medication, are potentially due to an effect of the medication or to a "time-of-the-day" effect (or still to both). Second, we show that smartphone data collected from a given study participant can represent a "digital fingerprint" of the participant, and that classifiers of case/control labels, constructed using longitudinal data, can show artificially improved performance when data from each participant is included in both training and test sets. We illustrate our contributions using data collected during the first 6 months of the mPower study.

17 citations


Cites methods from "Heteroskedasticity Autocorrelation ..."

  • ...While HAC estimation still assumes the data is equally spaced, it has been shown that application of the Newey-West estimator to time series with missing data (and, hence, unequally spaced) still generates asymptotically consistent estimates of the covariance matrix, as well as, reasonable performance in finite sample simulation studies[25, 26]....

    [...]

Journal ArticleDOI
04 Aug 2022-PLOS ONE
TL;DR: A personalized statistical approach to disentangle putative treatment and “time-of-the-day” effects, that leverages conditional independence relations spanned by causal graphical models involving the treatment, time-of the-day, and outcome variables is proposed.
Abstract: Ideally, a patient’s response to medication can be monitored by measuring changes in performance of some activity. In observational studies, however, any detected association between treatment (“on-medication” vs “off-medication”) and the outcome (performance in the activity) might be due to confounders. In particular, causal inferences at the personalized level are especially vulnerable to confounding effects that arise in a cyclic fashion. For quick acting medications, effects can be confounded by circadian rhythms and daily routines. Using the time-of-the-day as a surrogate for these confounders and the performance measurements as captured on a smartphone, we propose a personalized statistical approach to disentangle putative treatment and “time-of-the-day” effects, that leverages conditional independence relations spanned by causal graphical models involving the treatment, time-of-the-day, and outcome variables. Our approach is based on conditional independence tests implemented via standard and temporal linear regression models. Using synthetic data, we investigate when and how residual autocorrelation can affect the standard tests, and how time series modeling (namely, ARIMA and robust regression via HAC covariance matrix estimators) can remedy these issues. In particular, our simulations illustrate that when patients perform their activities in a paired fashion, positive autocorrelation can lead to conservative results for the standard regression approach (i.e., lead to deflated true positive detection), whereas negative autocorrelation can lead to anticonservative behavior (i.e., lead to inflated false positive detection). The adoption of time series methods, on the other hand, leads to well controlled type I error rates. We illustrate the application of our methodology with data from a Parkinson’s disease mobile health study.
TL;DR: In this article , the authors propose a method for constructing con(cid:133)dence intervals that account for many forms of spatial correlation, including the standard error and critical value.
Abstract: We propose a method for constructing con(cid:133)dence intervals that account for many forms of spatial correlation. The interval has the familiar (cid:145)estimator plus and minus a standard error times a critical value(cid:146)form, but we propose new methods for constructing the standard error and the critical value. The standard error is constructed using population principal components from a given (cid:145)worst-case(cid:146) spatial correlation model. The critical value is chosen to ensure coverage in a benchmark parametric model for the spatial correlations. The method is shown to control coverage in (cid:133)nite sample Gaussian settings in a restricted but nonparametric class of models and in large samples whenever the spatial correlation is weak, i.e., with average pairwise correlations that vanish as the sample size gets large. We also provide results on the e¢ ciency of the method.
References
More filters
ReportDOI
TL;DR: In this article, a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction is described.
Abstract: This paper describes a simple method of calculating a heteroskedasticity and autocorrelation consistent covariance matrix that is positive semi-definite by construction. It also establishes consistency of the estimated covariance matrix under fairly general conditions.

18,117 citations


"Heteroskedasticity Autocorrelation ..." refers background in this paper

  • ...This would seem particularly relevant for testing based on HAR variance estimators (e.g., Newey and West, 1987; Andrews, 1991) given that those estimators employ quadratic forms with weights that depend on the time distances of pairs of observations....

    [...]

Journal ArticleDOI
TL;DR: Using these results, data-dependent automatic bandwidth/lag truncation parameters are introduced and asymptotically optimal kernel/weighting scheme and bandwidth/agreement parameters are obtained.
Abstract: This paper is concerned with the estimation of covariance matrices in the presence of heteroskedasticity and autocorrelation of unknown forms. Currently available estimators that are designed for this context depend upon the choice of a lag truncation parameter and a weighting scheme. Results in the literature provide a condition on the growth rate of the lag truncation parameter as T \rightarrow \infty that is sufficient for consistency. No results are available, however, regarding the choice of lag truncation parameter for a fixed sample size, regarding data-dependent automatic lag truncation parameters, or regarding the choice of weighting scheme. In consequence, available estimators are not entirely operational and the relative merits of the estimators are unknown. This paper addresses these problems. The asymptotic truncated mean squared errors of estimators in a given class are determined and compared. Asymptotically optimal kernel/weighting scheme and bandwidth/lag truncation parameters are obtained using an asymptotic truncated mean squared error criterion. Using these results, data-dependent automatic bandwidth/lag truncation parameters are introduced. The finite sample properties of the estimators are analyzed via Monte Carlo simulation.

4,219 citations


"Heteroskedasticity Autocorrelation ..." refers background or methods in this paper

  • ..., Newey and West, 1987; Andrews, 1991) given that those estimators employ quadratic forms with weights that depend on the time distances of pairs of observations. One might reasonably conjecture (and we also conjectured) that the ES approach would be problematic. Surprisingly, we find that the ES approach can be justified theoretically with the fixed-b asymptotic framework and works better than one might expect. In practice the AM approach is prominent, but the ES approach is still used. For example, in the statistical package Stata, the command newey with the ‘force’ option computes Newey–West standard errors using the AM approach whereas the command newey2 with the ‘force’ option or the command hacreg computes Newey–West standard errors based on the ES approach. Our work is most closely related to Datta and Du (2012) who also analyze the AM and ES approaches in time series regressions with missing data. Their results provide a good foundation for the traditional small bandwidth asymptotic theory on HAR tests which appeals to consistency of the HAR variance estimators. Our results, on the other hand, are based on the fixed-b asymptotic framework as in Kiefer and Vogelsang (2005). The fixed-b results that we obtain can be viewed as useful refinements to the traditional theory because it is now well established that fixed-b theory provides improved approximations by capturing much of the...

    [...]

  • ...This would seem particularly relevant for testing based on HAR variance estimators (e.g., Newey and West, 1987; Andrews, 1991) given that those estimators employ quadratic forms with weights that depend on the time distances of pairs of observations....

    [...]

  • ..., Newey and West, 1987; Andrews, 1991) given that those estimators employ quadratic forms with weights that depend on the time distances of pairs of observations. One might reasonably conjecture (and we also conjectured) that the ES approach would be problematic. Surprisingly, we find that the ES approach can be justified theoretically with the fixed-b asymptotic framework and works better than one might expect. In practice the AM approach is prominent, but the ES approach is still used. For example, in the statistical package Stata, the command newey with the ‘force’ option computes Newey–West standard errors using the AM approach whereas the command newey2 with the ‘force’ option or the command hacreg computes Newey–West standard errors based on the ES approach. Our work is most closely related to Datta and Du (2012) who also analyze the AM and ES approaches in time series regressions with missing data....

    [...]

MonographDOI
13 Oct 1994

1,150 citations

Journal ArticleDOI
TL;DR: In this article, the authors developed a general asymptotic theory of regression for processes which are integrated of order one, including vector autoregressions and multivariate regressions among integrated processes that are driven by innovation sequences.
Abstract: This paper develops a general asymptotic theory of regression for processes which are integrated of order one. The theory includes vector autoregressions and multivariate regressions amongst integrated processes that are driven by innovation sequences which allow for a wide class of weak dependence and heterogeneity. The models studied cover cointegrated systems such as those advanced recently by Granger and Engle and quite general linear simultaneous equations systems with contemporaneous regressor error correlation and serially correlated errors. Problems of statistical testing in vector autoregressions and multivariate regressions with integrated processes are also studied. It is shown that the asympotic theory for conventional tests involves major departures from classical theory and raises new and important issues of the presence of nuisance parameters in the limiting distribution theory. Unlike many of the time series encountered in the natural sciences, economic time series frequently exhibit characteristics that are widely believed to be intrinsically nonstationary. For example, real macroeconomic variables such as output and consumption typically display a strong secular or growth component as well as cyclical behaviour; and many financial series like common stock prices behave in general as if they had no fixed mean. Recognizing these typical characteristics of economic time series, econometricians have devoted attention to the problem of describing and modelling nonstationarity. In the 1960's important contributions in the area were made by Granger, Hatanaka and their associates in Granger and Hatanaka (1964), Granger and Morgenstern (1963), Brillinger and Hatanaka (1968) and Hatanaka and Suzuki (1967). Later, following the influential work of Box and Jenkins (1976), attention shifted to the role of integrated processes in modelling economic time series. While undoubtedly restricting the class of nonstationary models, integrated processes of the ARIMA type have been found to produce highly satisfactory representations of many observed time series in economics. Quite recently, Nelson and Plosser (1982) have published a detailed empirical study of historical economic time series for the U.S.A. These authors provide some convincing evidence that macroeconomic time series normally thought to be stationary about a time trend are better described as integrated processes with drift. Amongst the latest research in this field have been the studies of cointegration by Granger and Weiss (1983) and Granger and Engle (1985). Two time series are said to be cointegrated if some linear combination of the series has a lower order of integration than the individual series. These authors argue that the notion of (steady state) equilibrium in economics implies the existence of such relationships. Thus, a classical economist's view of the interaction of money growth and price movements would require these series

800 citations

Journal ArticleDOI
TL;DR: In this article, an asymptotic theory for a first-order autoregression with a root near unity is proposed. But the theory is not suitable for continuous time estimation and the analysis of the power of tests for a unit root under a sequence of local alternatives.
Abstract: SUMMARY This paper develops an asymptotic theory for a first-order autoregression with a root near unity. Deviations from the unit root theory are measured through a noncentrality parameter. When this parameter is negative we have a local alternative that is stationary; when it is positive the local alternative is explosive; and when it is zero we have the standard unit root theory. Our asymptotic theory accommodates these possibilities and helps to unify earlier theory in which the unit root case appears as a singularity of the asymptotics. The general theory is expressed in terms of functionals of a simple diffusion process. The theory has applications to continuous time estimation and to the analysis of the asymptotic power of tests for a unit root under a sequence of local alternatives.

772 citations