Applying a kernel function on time-dependent data to provide supervised-learning guarantees

doi:10.1016/J.ESWA.2016.11.028

Journal ArticleDOI

Applying a kernel function on time-dependent data to provide supervised-learning guarantees

Lucas de Carvalho Pagliosa, +1 more

- 01 Apr 2017 -

Expert Systems With Applications

- Vol. 71, pp 216-229

TLDR

A kernel function is applied to reconstruct time-dependent open-ended sequences of observations, also referred to as data streams in the context of Machine Learning, into multidimensional spaces in attempt to hold the data independency assumption.

Abstract:

We employ a Monte-Carlo approach to find the best phase space for a given data stream.We propose kFTCV, a novel approach to validate data stream classification.Results show Taken's theorem can transform data streams into independent states.Therefore, we can rely on SLT framework to ensure learning when dealing with data streams. The Statistical Learning Theory (SLT) defines five assumptions to ensure learning for supervised algorithms. Data independency is one of those assumptions, once the SLT relies on the Law of Large Numbers to ensure learning bounds. As a consequence, this assumption imposes a strong limitation to guarantee learning on time-dependent scenarios. In order to tackle this issue, some researchers relax this assumption with the detriment of invalidating all theoretical results provided by the SLT. In this paper we apply a kernel function, more precisely the Takens' immersion theorem, to reconstruct time-dependent open-ended sequences of observations, also referred to as data streams in the context of Machine Learning, into multidimensional spaces (a.k.a. phase spaces) in attempt to hold the data independency assumption. At first, we study the best immersion parameterization for our kernel function using the Distance-Weighted Nearest Neighbors (DWNN). Next, we use this best immersion to recursively forecast next observations based on the prediction horizon, estimated using the Lyapunov exponent. Afterwards, predicted observations are compared against the expected ones using the Mean Distance from the Diagonal Line (MDDL). Theoretical and experimental results based on a cross-validation strategy provide stronger evidences of generalization, what allows us to conclude that one can learn from time-dependent data after using our approach. This opens up a very important possibility for ensuring supervised learning when it comes to time-dependent data, being useful to tackle applications such as in the climate, animal tracking, biology and other domains.

Applying a kernel function on time-dependent data to provide supervised-learning guarantees

Citations

A Guide to Monte Carlo Simulations in Statistical Physics: Reweighting methods

On learning guarantees to unsupervised concept drift detection on data streams

The Laws of Large Numbers

Decomposing time series into deterministic and stochastic influences: A survey

Semi-supervised time series classification on positive and unlabeled problems using cross-recurrence quantification analysis

References

Detecting strange attractors in turbulence

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

An efficient k-means clustering algorithm: analysis and implementation

Independent coordinates for strange attractors from mutual information.

Nonlinear time series analysis

Related Papers (5)

Detecting strange attractors in turbulence

Nonlinear Time Series Analysis

Fast Randomized Kernel Methods With Statistical Guarantees

Stability Bounds for Non-i.i.d. Processes

Adversarial target-invariant representation learning for domain generalization