scispace - formally typeset
Search or ask a question
Author

Abdullah Mueen

Bio: Abdullah Mueen is an academic researcher from University of New Mexico. The author has contributed to research in topics: Cluster analysis & Dynamic time warping. The author has an hindex of 31, co-authored 86 publications receiving 5042 citations. Previous affiliations of Abdullah Mueen include University of California, Riverside & Bangladesh University of Engineering and Technology.


Papers
More filters
Proceedings ArticleDOI
12 Aug 2012
TL;DR: This work shows that by using a combination of four novel ideas the authors can search and mine truly massive time series for the first time, and shows that in large datasets they can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms.
Abstract: Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

969 citations

Journal ArticleDOI
TL;DR: An extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains gives an overview of these different techniques and presents comparative experimental findings regarding their effectiveness.
Abstract: The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this article, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.

747 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: A novel scalable algorithm for time series subsequence all-pairs-similarity-search that computes the answer to the time series motif and time series discord problem as a side-effect, and incidentally provides the fastest known algorithm for both these extensively-studied problems.
Abstract: The all-pairs-similarity-search (or similarity join) problem has been extensively studied for text and a handful of other datatypes. However, surprisingly little progress has been made on similarity joins for time series subsequences. The lack of progress probably stems from the daunting nature of the problem. For even modest sized datasets the obvious nested-loop algorithm can take months, and the typical speed-up techniques in this domain (i.e., indexing, lower-bounding, triangular-inequality pruning and early abandoning) at best produce one or two orders of magnitude speedup. In this work we introduce a novel scalable algorithm for time series subsequence all-pairs-similarity-search. For exceptionally large datasets, the algorithm can be trivially cast as an anytime algorithm and produce high-quality approximate solutions in reasonable time. The exact similarity join algorithm computes the answer to the time series motif and time series discord problem as a side-effect, and our algorithm incidentally provides the fastest known algorithm for both these extensively-studied problems. We demonstrate the utility of our ideas for two time series data mining problems, including motif discovery and novelty discovery.

452 citations

Proceedings ArticleDOI
01 Jan 2009
TL;DR: For the first time, a tractable exact algorithm to find time series motifs is shown and it is shown that this algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, near-duplicate detection and summarization.
Abstract: Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of interest Since the formalism of time series motifs in 2002, dozens of researchers have used them for diverse applications in many different domains Because the obvious algorithm for computing motifs is quadratic in the number of items, more than a dozen approximate algorithms to discover motifs have been proposed in the literature In this work, for the first time, we show a tractable exact algorithm to find time series motifs As we shall show through extensive experiments, our algorithm is up to three orders of magnitude faster than brute-force search in large datasets We further show that our algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, near-duplicate detection and summarization, and we consider detailed case studies in domains as diverse as electroencephalograph interpretation and entomological telemetry data mining

412 citations

Proceedings ArticleDOI
21 Aug 2011
TL;DR: This work introduces a novel algorithm that finds shapelets in less time than current methods by an order of magnitude, and shows for the first time an augmented shapelet representation that distinguishes the data based on conjunctions or disjunctions of shapelets.
Abstract: Time series shapelets are small, local patterns in a time series that are highly predictive of a class and are thus very useful features for building classifiers and for certain visualization and summarization tasks. While shapelets were introduced only recently, they have already seen significant adoption and extension in the community. Despite their immense potential as a data mining primitive, there are two important limitations of shapelets. First, their expressiveness is limited to simple binary presence/absence questions. Second, even though shapelets are computed offline, the time taken to compute them is significant. In this work, we address the latter problem by introducing a novel algorithm that finds shapelets in less time than current methods by an order of magnitude. Our algorithm is based on intelligent caching and reuse of computations, and the admissible pruning of the search space. Because our algorithm is so fast, it creates an opportunity to consider more expressive shapelet queries. In particular, we show for the first time an augmented shapelet representation that distinguishes the data based on conjunctions or disjunctions of shapelets. We call our novel representation Logical-Shapelets. We demonstrate the efficiency of our approach on the classic benchmark datasets used for these problems, and show several case studies where logical shapelets significantly outperform the original shapelet representation and other time series classification techniques. We demonstrate the utility of our ideas in domains as diverse as gesture recognition, robotics, and biometrics.

298 citations


Cited by
More filters
Reference EntryDOI
15 Oct 2004

2,118 citations

Book
16 Apr 2013
TL;DR: How to Construct Nonparametric Regression Estimates * Lower Bounds * Partitioning Estimates * Kernel Estimates * k-NN Estimates * Splitting the Sample * Cross Validation * Uniform Laws of Large Numbers
Abstract: Why is Nonparametric Regression Important? * How to Construct Nonparametric Regression Estimates * Lower Bounds * Partitioning Estimates * Kernel Estimates * k-NN Estimates * Splitting the Sample * Cross Validation * Uniform Laws of Large Numbers * Least Squares Estimates I: Consistency * Least Squares Estimates II: Rate of Convergence * Least Squares Estimates III: Complexity Regularization * Consistency of Data-Dependent Partitioning Estimates * Univariate Least Squares Spline Estimates * Multivariate Least Squares Spline Estimates * Neural Networks Estimates * Radial Basis Function Networks * Orthogonal Series Estimates * Advanced Techniques from Empirical Process Theory * Penalized Least Squares Estimates I: Consistency * Penalized Least Squares Estimates II: Rate of Convergence * Dimension Reduction Techniques * Strong Consistency of Local Averaging Estimates * Semi-Recursive Estimates * Recursive Estimates * Censored Observations * Dependent Observations

1,931 citations

Journal ArticleDOI
TL;DR: This article proposes the most exhaustive study of DNNs for TSC by training 8730 deep learning models on 97 time series datasets and provides an open source deep learning framework to the TSC community.
Abstract: Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.

1,833 citations

Book
02 Jan 1991

1,377 citations