Posted Content•

# Online Non-linear Topology Identification from Graph-connected Time Series

TL;DR: In this paper, the authors proposed an online kernel-based algorithm for topology estimation of non-linear vector autoregressive time series by solving a sparse online optimization framework using the composite objective mirror descent method.

Abstract: Estimating the unknown causal dependencies among graph-connected time series plays an important role in many applications, such as sensor network analysis, signal processing over cyber-physical systems, and finance engineering. Inference of such causal dependencies, often know as topology identification, is not well studied for non-linear non-stationary systems, and most of the existing methods are batch-based which are not capable of handling streaming sensor signals. In this paper, we propose an online kernel-based algorithm for topology estimation of non-linear vector autoregressive time series by solving a sparse online optimization framework using the composite objective mirror descent method. Experiments conducted on real and synthetic data sets show that the proposed algorithm outperforms the state-of-the-art methods for topology estimation.

##### Citations

More filters

•

TL;DR: In this article, an algorithmic framework is proposed to learn time-varying graphs from online data, where different functions regulate different desiderata, e.g., data fidelity, sparsity or smoothness.

Abstract: This work proposes an algorithmic framework to learn time-varying graphs from online data. The generality offered by the framework renders it model-independent, i.e., it can be theoretically analyzed in its abstract formulation and then instantiated under a variety of model-dependent graph learning problems. This is possible by phrasing (time-varying) graph learning as a composite optimization problem, where different functions regulate different desiderata, e.g., data fidelity, sparsity or smoothness. Instrumental for the findings is recognizing that the dependence of the majority (if not all) data-driven graph learning algorithms on the data is exerted through the empirical covariance matrix, representing a sufficient statistic for the estimation problem. Its user-defined recursive update enables the framework to work in non-stationary environments, while iterative algorithms building on novel time-varying optimization tools explicitly take into account the temporal dynamics, speeding up convergence and implicitly including a temporal-regularization of the solution. We specialize the framework to three well-known graph learning models, namely, the Gaussian graphical model (GGM), the structural equation model (SEM), and the smoothness-based model (SBM), where we also introduce ad-hoc vectorization schemes for structured matrices (symmetric, hollows, etc.) which are crucial to perform correct gradient computations, other than enabling to work in low-dimensional vector spaces and hence easing storage requirements. After discussing the theoretical guarantees of the proposed framework, we corroborate it with extensive numerical tests in synthetic and real data.

9 citations

•

TL;DR: In this paper, a method for nonlinear topology identification is proposed, based on the assumption that a collection of time series are generated in two steps: i) a vector autoregressive process in a latent space, and ii) a nonlinear, component-wise, monotonically increasing mapping.

Abstract: A method for nonlinear topology identification is proposed, based on the assumption that a collection of time series are generated in two steps: i) a vector autoregressive process in a latent space, and ii) a nonlinear, component-wise, monotonically increasing observation mapping. The latter mappings are assumed invertible, and are modelled as shallow neural networks, so that their inverse can be numerically evaluated, and their parameters can be learned using a technique inspired in deep learning. Due to the function inversion, the back-propagation step is not straightforward, and this paper explains the steps needed to calculate the gradients applying implicit differentiation. Whereas the model explainability is the same as that for linear VAR processes, preliminary numerical tests show that the prediction error becomes smaller.

1 citations

•

TL;DR: In this article, an online version of classic batch graph learning method was developed to better track graphs in dynamic environments, where graphs evolve in certain patterns such that dynamic priors might be embedded in the online graph learning framework.

Abstract: Inferring the underlying graph topology that characterizes structured data is pivotal to many graph-based models when pre-defined graphs are not available. This paper focuses on learning graphs in the case of sequential data in dynamic environments. For sequential data, we develop an online version of classic batch graph learning method. To better track graphs in dynamic environments, we assume graphs evolve in certain patterns such that dynamic priors might be embedded in the online graph learning framework. When the information of these hidden patterns is not available, we use history data to predict the evolution of graphs. Furthermore, dynamic regret analysis of the proposed method is performed and illustrates that our online graph learning algorithms can reach sublinear dynamic regret. Experimental results support the fact that our method is superior to the state-of-art methods.

•

TL;DR: In this paper, a kernel-based algorithm for graph topology estimation is proposed, which uses a Fourier-based Random feature approximation to tackle the curse of dimensionality associated with the kernel representations.

Abstract: Online topology estimation of graph-connected time series is challenging, especially since the causal dependencies in many real-world networks are nonlinear. In this paper, we propose a kernel-based algorithm for graph topology estimation. The algorithm uses a Fourier-based Random feature approximation to tackle the curse of dimensionality associated with the kernel representations. Exploiting the fact that the real-world networks often exhibit sparse topologies, we propose a group lasso based optimization framework, which is solve using an iterative composite objective mirror descent method, yielding an online algorithm with fixed computational complexity per iteration. The experiments conducted on real and synthetic data show that the proposed method outperforms its competitors.

##### References

More filters

••

01 Dec 2001TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.

Abstract: From the Publisher:
In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs-kernels--for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics.
Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

7,880 citations

•

01 Mar 1990

TL;DR: In this paper, a theory and practice for the estimation of functions from noisy data on functionals is developed, where convergence properties, data based smoothing parameter selection, confidence intervals, and numerical methods are established which are appropriate to a number of problems within this framework.

Abstract: This book serves well as an introduction into the more theoretical aspects of the use of spline models. It develops a theory and practice for the estimation of functions from noisy data on functionals. The simplest example is the estimation of a smooth curve, given noisy observations on a finite number of its values. Convergence properties, data based smoothing parameter selection, confidence intervals, and numerical methods are established which are appropriate to a number of problems within this framework. Methods for including side conditions and other prior information in solving ill posed inverse problems are provided. Data which involves samples of random variables with Gaussian, Poisson, binomial, and other distributions are treated in a unified optimization context. Experimental design questions, i.e., which functionals should be observed, are studied in a general context. Extensions to distributed parameter system identification problems are made by considering implicitly defined functionals.

6,120 citations

••

16 Jul 2001

TL;DR: The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.

Abstract: Wahba's classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empirical risk terms, and give a self-contained proof utilizing the feature space associated with a kernel. The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.

1,707 citations

••

[...]

TL;DR: A regularized model for linear regression with ℓ1 andℓ2 penalties is introduced and it is shown that it has the desired effect of group-wise and within group sparsity.

Abstract: For high-dimensional supervised learning problems, often using problem-specific assumptions can lead to greater accuracy. For problems with grouped covariates, which are believed to have sparse effects both on a group and within group level, we introduce a regularized model for linear regression with l1 and l2 penalties. We discuss the sparsity and other regularization properties of the optimal fit for this model, and show that it has the desired effect of group-wise and within group sparsity. We propose an algorithm to fit the model via accelerated generalized gradient descent, and extend this model and algorithm to convex loss functions. We also demonstrate the efficacy of our model and the efficiency of our algorithm on simulated data. This article has online supplementary material.

1,233 citations

[...]

01 Dec 2006

TL;DR: Conditions on the features of a continuous kernel are investigated so that it may approximate an arbitrary continuous target function uniformly on any compact subset of the input space.

Abstract: In this paper we investigate conditions on the features of a continuous kernel so that it may approximate an arbitrary continuous target function uniformly on any compact subset of the input space. A number of concrete examples are given of kernels with this universal approximating property.

509 citations