scispace - formally typeset
Open AccessJournal ArticleDOI

Experimental comparison of representation methods and distance measures for time series data

Reads0
Chats0
TLDR
An extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains gives an overview of these different techniques and presents comparative experimental findings regarding their effectiveness.
Abstract
The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this article, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Time-series clustering - A decade review

TL;DR: This review will expose four main components of time-series clustering and is aimed to represent an updated investigation on the trend of improvements in efficiency, quality and complexity of clustering time- series approaches during the last decade and enlighten new paths for future works.
Journal ArticleDOI

The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

TL;DR: This work implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets, indicating that only nine of these algorithms are significantly more accurate than both benchmarks.
Proceedings ArticleDOI

k-Shape: Efficient and Accurate Clustering of Time Series

TL;DR: K-Shape as discussed by the authors uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them, and develops a method to compute cluster centroids, which are used in every iteration to update the assignment of the time series to clusters.
Proceedings ArticleDOI

Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets.

TL;DR: This work proposes a fast shapelet discovery algorithm that outperforms the current state-of-the-art by two or three orders of magnitude, while producing models with accuracy that is not perceptibly different.
Journal ArticleDOI

Using dynamic time warping distances as features for improved time series classification

TL;DR: This paper presents a simple technique for time series classification that exploits DTW’s strength on this task but instead of directly using DTW as a distance measure to find nearest neighbors, the technique uses DTW to create new features which are then given to a standard machine learning method.
References
More filters
Book

Data Mining: Concepts and Techniques

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Book

Pattern classification and scene analysis

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Proceedings Article

A study of cross-validation and bootstrap for accuracy estimation and model selection

TL;DR: The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.
Book

Introduction to Data Mining

TL;DR: This book discusses data mining through the lens of cluster analysis, which examines the relationships between data, clusters, and algorithms, and some of the techniques used to solve these problems.
Related Papers (5)