scispace - formally typeset
Search or ask a question
Book ChapterDOI

Detecting leaders from correlated time series

TL;DR: In this article, the problem of discovering leaders among a set of time series by analyzing lead-lag relations is studied, and an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy is proposed.
Abstract: Analyzing the relationships of time series is an important problem for many applications, including climate monitoring, stock investment, traffic control, etc Existing research mainly focuses on studying the relationship between a pair of time series In this paper, we study the problem of discovering leaders among a set of time series by analyzing lead-lag relations A time series is considered to be one of the leaders if its rise or fall impacts the behavior of many other time series At each time point, we compute the lagged correlation between each pair of time series and model them in a graph Then, the leadership rank is computed from the graph, which brings order to time series Based on the leadership ranking, the leaders of time series are extracted However, the problem poses great challenges as time goes by, since the dynamic nature of time series results in highly evolving relationships between time series We propose an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy Our experiments on real climate science data and stock data show that our algorithm is able to compute time series leaders efficiently in a real-time manner and the detected leaders demonstrate high predictive power on the event of general time series entities, which can enlighten both climate monitoring and financial risk control
Citations
More filters
Proceedings ArticleDOI
27 May 2015
TL;DR: K-Shape as discussed by the authors uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them, and develops a method to compute cluster centroids, which are used in every iteration to update the assignment of the time series to clusters.
Abstract: The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining methods, not only due to its exploratory power, but also as a preprocessing step or subroutine for other techniques. In this paper, we present k-Shape, a novel algorithm for time-series clustering. k-Shape relies on a scalable iterative refinement procedure, which creates homogeneous and well-separated clusters. As its distance measure, k-Shape uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them. Based on the properties of that distance measure, we develop a method to compute cluster centroids, which are used in every iteration to update the assignment of time series to clusters. To demonstrate the robustness of k-Shape, we perform an extensive experimental evaluation of our approach against partitional, hierarchical, and spectral clustering methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable approaches in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable (and hence impractical) combinations, with one exception that achieves similar accuracy results. However, unlike k-Shape, this combination requires tuning of its distance measure and is two orders of magnitude slower than k-Shape. Overall, k-Shape emerges as a domain-independent, highly accurate, and highly efficient clustering approach for time series with broad applications.

433 citations

Journal ArticleDOI
TL;DR: In this paper, a comprehensive review of energy forecasting models that are specified in the literature part is also provided, highlighting strengths, shortcomings, and purpose of the methods of numerous data-mining based approaches.

169 citations

Journal ArticleDOI
01 Jun 2017
TL;DR: SBD, k-Shape, and k-MS emerge as domain-independent, highly accurate, and efficient methods for time-series comparison and clustering with broad applications, with combinations of the most competitive distance measures.
Abstract: The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data-mining methods, not only due to its exploratory power but also because it is often a preprocessing step or subroutine for other techniques. In this article, we present k-Shape and k-MultiShapes (k-MS), two novel algorithms for time-series clustering. k-Shape and k-MS rely on a scalable iterative refinement procedure. As their distance measure, k-Shape and k-MS use shape-based distance (SBD), a normalized version of the cross-correlation measure, to consider the shapes of time series while comparing them. Based on the properties of SBD, we develop two new methods, namely ShapeExtraction (SE) and MultiShapesExtraction (MSE), to compute cluster centroids that are used in every iteration to update the assignment of time series to clusters. k-Shape relies on SE to compute a single centroid per cluster based on all time series in each cluster. In contrast, k-MS relies on MSE to compute multiple centroids per cluster to account for the proximity and spatial distribution of time series in each cluster. To demonstrate the robustness of SBD, k-Shape, and k-MS, we perform an extensive experimental evaluation on 85 datasets against state-of-the-art distance measures and clustering methods for time series using rigorous statistical analysis. SBD, our efficient and parameter-free distance measure, achieves similar accuracy to Dynamic Time Warping (DTW), a highly accurate but computationally expensive distance measure that requires parameter tuning. For clustering, we compare k-Shape and k-MS against scalable and non-scalable partitional, hierarchical, spectral, density-based, and shapelet-based methods, with combinations of the most competitive distance measures. k-Shape outperforms all scalable methods in terms of accuracy. Furthermore, k-Shape also outperforms all non-scalable approaches, with one exception, namely k-medoids with DTW, which achieves similar accuracy. However, unlike k-Shape, this approach requires tuning of its distance measure and is significantly slower than k-Shape. k-MS performs similarly to k-Shape in comparison to rival methods, but k-MS is significantly more accurate than k-Shape. Beyond clustering, we demonstrate the effectiveness of k-Shape to reduce the search space of one-nearest-neighbor classifiers for time series. Overall, SBD, k-Shape, and k-MS emerge as domain-independent, highly accurate, and efficient methods for time-series comparison and clustering with broad applications.

154 citations

Proceedings ArticleDOI
01 Oct 2013
TL;DR: This work addresses the problem of improving consumption forecasting by using the statistical relations between consumption series by using various machine learning techniques, both at the household and district scales (hundreds of houses).
Abstract: The recent development of smart meters has allowed the analysis of household electricity consumption in real time. Predicting electricity consumption at such very low scales should help to increase the efficiency of distribution networks and energy pricing. However, this is by no means a trivial task since household-level consumption is much more irregular than at the transmission or distribution levels. In this work, we address the problem of improving consumption forecasting by using the statistical relations between consumption series. This is done both at the household and district scales (hundreds of houses), using various machine learning techniques, such as support vector machine for regression (SVR) and multilayer perceptron (MLP). First, we determine which algorithm is best adapted to each scale, then, we try to find leaders among the time series, to help short-term forecasting. We also improve the forecasting for district consumption by clustering houses according to their consumption profiles.

137 citations

Proceedings ArticleDOI
Chen Luo1, Jian-Guang Lou2, Qingwei Lin2, Qiang Fu2, Rui Ding2, Dongmei Zhang2, Zhe Wang1 
24 Aug 2014
TL;DR: An approach to evaluate the correlation between time series data and event data is proposed, capable of discovering three important aspects of event-timeseries correlation in the context of incident diagnosis: existence of correlation, temporal order, and monotonic effect.
Abstract: As online services have more and more popular, incident diagnosis has emerged as a critical task in minimizing the service downtime and ensuring high quality of the services provided For most online services, incident diagnosis is mainly conducted by analyzing a large amount of telemetry data collected from the services at runtime Time series data and event sequence data are two major types of telemetry data Techniques of correlation analysis are important tools that are widely used by engineers for data-driven incident diagnosis Despite their importance, there has been little previous work addressing the correlation between two types of heterogeneous data for incident diagnosis: continuous time series data and temporal event data In this paper, we propose an approach to evaluate the correlation between time series data and event data Our approach is capable of discovering three important aspects of event-timeseries correlation in the context of incident diagnosis: existence of correlation, temporal order, and monotonic effect Our experimental results on simulation data sets and two real data sets demonstrate the effectiveness of the algorithm

92 citations

References
More filters
Book
01 Jan 1970
TL;DR: In this article, a complete revision of a classic, seminal, and authoritative book that has been the model for most books on the topic written since 1970 is presented, focusing on practical techniques throughout, rather than a rigorous mathematical treatment of the subject.
Abstract: From the Publisher: This is a complete revision of a classic, seminal, and authoritative book that has been the model for most books on the topic written since 1970. It focuses on practical techniques throughout, rather than a rigorous mathematical treatment of the subject. It explores the building of stochastic (statistical) models for time series and their use in important areas of application —forecasting, model specification, estimation, and checking, transfer function modeling of dynamic relationships, modeling the effects of intervention events, and process control. Features sections on: recently developed methods for model specification, such as canonical correlation analysis and the use of model selection criteria; results on testing for unit root nonstationarity in ARIMA processes; the state space representation of ARMA models and its use for likelihood estimation and forecasting; score test for model checking; and deterministic components and structural components in time series models and their estimation based on regression-time series model methods.

19,748 citations

Journal ArticleDOI
TL;DR: In this article, the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation, and measures of causal lag and causal strength can then be constructed.
Abstract: There occurs on some occasions a difficulty in deciding the direction of causality between two related variables and also whether or not feedback is occurring. Testable definitions of causality and feedback are proposed and illustrated by use of simple two-variable models. The important problem of apparent instantaneous causality is discussed and it is suggested that the problem often arises due to slowness in recording information or because a sufficiently wide class of possible causal variables has not been used. It can be shown that the cross spectrum between two variables can be decomposed into two parts, each relating to a single causal arm of a feedback situation. Measures of causal lag and causal strength can then be constructed. A generalisation of this result with the partial cross spectrum is suggested.

16,349 citations

Journal ArticleDOI
01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,696 citations

Journal Article
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

13,327 citations

Journal ArticleDOI
TL;DR: This revision of a classic, seminal, and authoritative book explores the building of stochastic models for time series and their use in important areas of application —forecasting, model specification, estimation, and checking, transfer function modeling of dynamic relationships, modeling the effects of intervention events, and process control.
Abstract: From the Publisher: This is a complete revision of a classic, seminal, and authoritative book that has been the model for most books on the topic written since 1970. It focuses on practical techniques throughout, rather than a rigorous mathematical treatment of the subject. It explores the building of stochastic (statistical) models for time series and their use in important areas of application —forecasting, model specification, estimation, and checking, transfer function modeling of dynamic relationships, modeling the effects of intervention events, and process control. Features sections on: recently developed methods for model specification, such as canonical correlation analysis and the use of model selection criteria; results on testing for unit root nonstationarity in ARIMA processes; the state space representation of ARMA models and its use for likelihood estimation and forecasting; score test for model checking; and deterministic components and structural components in time series models and their estimation based on regression-time series model methods.

12,650 citations