scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Reconstruction of missing data in multivariate processes with applications to causality analysis

TL;DR: The proposed method is developed in the framework of sparse optimization while adopting a parametric approach using vector auto-regressive (VAR) models, where both the temporal and spatial correlations can be exploited for efficient data recovery.
Abstract: Recovery of missing observations in time-series has been a century-long subject of study, giving rise to two broad classes of methods, namely, one that reconstructs data and the other that directly estimate the statistical properties of the data, largely for univariate processes. In this work, we present a data reconstruction technique for multivariate processes. The proposed method is developed in the framework of sparse optimization while adopting a parametric approach using vector auto-regressive (VAR) models, where both the temporal and spatial correlations can be exploited for efficient data recovery. The primary purpose of recovering the missing data in this work is to develop a directed graphical or a network representation of the multivariate process under study. Existing methods for data-driven network reconstruction are built on the assumption of data being available at regular intervals. In this respect, the proposed method offers an effective methodology for reconstructing weighted causal networks from missing data. The scope of this work is restricted to linear, jointly stationary multivariate processes that can be suitably represented by VAR models of finite order and missing data of the random type. Simulation studies on different data generating processes with varying proportions of missing observations illustrate the efficacy of the proposed method in recovering the multivariate signals and thereby reconstructing weighted causal networks.
Citations
More filters
Journal Article
TL;DR: A new frequency-domain approach to describe the relationships (direction of information flow) between multivariate time series based on the decomposition of multivariate partial coherences computed from multivariate autoregressive models is introduced.
Abstract: Abstract. This paper introduces a new frequency-domain approach to describe the relationships (direction of information flow) between multivariate time series based on the decomposition of multivariate partial coherences computed from multivariate autoregressive models. We discuss its application and compare its performance to other approaches to the problem of determining neural structure relations from the simultaneous measurement of neural electrophysiological signals. The new concept is shown to reflect a frequency-domain representation of the concept of Granger causality.

176 citations

01 Jan 1992
TL;DR: In this paper, several approaches to the identification problem are presented, including a new method based on the EM (expectation maximization) algorithm, and different approaches are tested and compared using Monte Carlo simulations.
Abstract: Parameter estimation when the measurement information may be incomplete is discussed. An ARX model is used as a basic system representation. The presentation covers both missing output and missing input. First reconstruction of the missing values is discussed. The reconstruction is based on a state-space formulation of the system, and is performed using Kalman filtering or fixed-interval smoothing formulas. Several approaches to the identification problem are presented, including a new method based on the EM (expectation maximization) algorithm. The different approaches are tested and compared using Monte Carlo simulations. The choice of method is always a tradeoff between estimation accuracy and computational complexity. According to the simulations the gain in accuracy using the EM method can be considerable if many data are missing. >

103 citations

Journal ArticleDOI
TL;DR: A classification approach is proposed for finding ranges of process inputs that result in corresponding ranges of a process profit function using deep learning.
Abstract: A classification approach is proposed for finding ranges of process inputs that result in corresponding ranges of a process profit function using deep learning. Two deep learning tools are used to ...

13 citations

Journal ArticleDOI
TL;DR: This work demonstrates that the presence of a mismatch between model structure and DGP leads to biased and/or inefficient estimates of causality measures and, hence, the strengths (weights) of causal connections.
Abstract: Multivariable dynamical processes are characterized by complex cause and effect relationships among variables. Reconstruction of these causal connections from data, especially based on the concept of Granger causality (GC), has attracted significant attention in process engineering with applications to interaction assessment, topology reconstruction, and fault detection. The standard practice for reconstruction of GC networks has been along the parametric route that deploys vector autoregressive (VAR) models but without giving due importance to the structural characteristics of data generating process (DGP). In this work, we first demonstrate that the presence of a mismatch between model structure and DGP leads to biased and/or inefficient estimates of causality measures and, hence, the strengths (weights) of causal connections. This issue is further aggravated for small sample sizes wherein additionally spurious causal relationships are detected. In this respect, we present, second, a systematic methodol...

9 citations

Proceedings ArticleDOI
01 Sep 2019
TL;DR: A new definition of direct causality for deterministic linear time-invariant (LTI) dynamical systems is presented and a novel causality detection method based on delay estimation from noisy multivariable measurements is presented, well-suited to handle low excitation signals and does not require the specification of any model structure.
Abstract: Reconstruction of process topology from cause-effect analysis of measurements finds applications in root-cause analysis, identification of disturbance propagation pathways, estimation of fault propagation times, and interaction assessment. A widely used technique based on the notion of Granger causality (GC), but is well-suited only for stationary stochastic processes. The GC-based measures and methods, while being useful in certain cases, can be highly restrictive and produce misleading results in engineering applications since changes in process variables are frequently deterministic. The lack of sufficient excitation and presence of measurement errors further restricts their applicability. In this respect, we present (i) a new definition of direct causality for deterministic linear time-invariant (LTI) dynamical systems, and (ii) a novel causality detection method based on delay estimation from noisy multivariable measurements. Efficient estimates of time delays are obtained from recently developed non-parametric frequency domain method based on partial coherence and Hilbert transform relations. In addition to the ability of handling deterministic variations, the proposed causality detection method is well-suited to handle low excitation signals and does not require the specification of any model structure. Case studies involving data from synthetic and benchmark processes are presented to illustrate the utility and efficacy of the proposed method.

4 citations


Cites background from "Reconstruction of missing data in m..."

  • ...Small sample sizes (D3) and missing observations (D5) can be handled through compressive sensing ideas / sparse optimization techniques with some additional effort [7, 8]....

    [...]

References
More filters
Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations

Journal ArticleDOI
TL;DR: In this article, the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation, and measures of causal lag and causal strength can then be constructed.
Abstract: There occurs on some occasions a difficulty in deciding the direction of causality between two related variables and also whether or not feedback is occurring. Testable definitions of causality and feedback are proposed and illustrated by use of simple two-variable models. The important problem of apparent instantaneous causality is discussed and it is suggested that the problem often arises due to slowness in recording information or because a sufficiently wide class of possible causal variables has not been used. It can be shown that the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation. Measures of causal lag and causal strength can then be constructed. A generalisation of this result with the partial cross spectrum is suggested.

16,349 citations

Journal ArticleDOI
TL;DR: This paper studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced to retain the simple statistical behavior of the evenly spaced case.
Abstract: Detection of a periodic signal hidden in noise is frequently a goal in astronomical data analysis This paper does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced This choice was made because, of the methods in current use, it appears to have the simplest statistical behavior A modification of the classical definition of the periodogram is necessary in order to retain the simple statistical behavior of the evenly spaced case With this modification, periodogram analysis and least-squares fitting of sine waves to the data are exactly equivalent Certain difficulties with the use of the periodogram are less important than commonly believed in the case of detection of strictly periodic signals In addition, the standard method for mitigating these difficulties (tapering) can be used just as well if the sampling is uneven An analysis of the statistical significance of signal detections is presented, with examples

6,761 citations

BookDOI
04 Oct 2007
TL;DR: This reference work and graduate level textbook considers a wide range of models and methods for analyzing and forecasting multiple time series, which include vector autoregressive, cointegrated, vector Autoregressive moving average, multivariate ARCH and periodic processes as well as dynamic simultaneous equations and state space models.
Abstract: This reference work and graduate level textbook considers a wide range of models and methods for analyzing and forecasting multiple time series. The models covered include vector autoregressive, cointegrated, vector autoregressive moving average, multivariate ARCH and periodic processes as well as dynamic simultaneous equations and state space models. Least squares, maximum likelihood, and Bayesian methods are considered for estimating these models. Different procedures for model selection and model specification are treated and a wide range of tests and criteria for model checking are introduced. Causality analysis, impulse response analysis and innovation accounting are presented as tools for structural analysis. The book is accessible to graduate students in business and economics. In addition, multiple time series courses in other fields such as statistics and engineering may be based on it. Applied researchers involved in analyzing multiple time series may benefit from the book as it provides the background and tools for their tasks. It bridges the gap to the difficult technical literature on the topic.

5,244 citations

Journal ArticleDOI
N. R. Lomb1
TL;DR: In this article, the statistical properties of least-squares frequency analysis of unequally spaced data are examined and it is shown that the reduction in the sum of squares at a particular frequency is a X22 variable.
Abstract: The statistical properties of least-squares frequency analysis of unequally spaced data are examined. It is shown that, in the least-squares spectrum of gaussian noise, the reduction in the sum of squares at a particular frequency is aX22 variable. The reductions at different frequencies are not independent, as there is a correlation between the height of the spectrum at any two frequencies,f1 andf2, which is equal to the mean height of the spectrum due to a sinusoidal signal of frequencyf1, at the frequencyf2. These correlations reduce the distortion in the spectrum of a signal affected by noise. Some numerical illustrations of the properties of least-squares frequency spectra are also given.

4,950 citations