scispace - formally typeset
Proceedings ArticleDOI

Mining the stock market (extended abstract): which measure is best?

TLDR
The approach is to cluster the stocks according to various measures and compare the results to the ”groundtruth” clustering based on the Standard and Poor 500 Index and reveal several interesting facts about the similarity measures used for stock-market data.
Abstract
In recent years, there has been a lot of interest in the database community in mining time series data. Surprisingly, little work has been done on verifying which measures are most suitable for mining of a given class of data sets. Such work is of crucial importance, since it enables us to identify similarity measures which are useful in a given context and therefore for which efficient algorithms should be further investigated. Moreover, an accurate evaluation of the performance of even existing algorithms is not possible without a good understanding of the data sets occurring in practice. In this work we attempt to fill this gap by studying similarity measures for clustering of similar stocks (which, of course, is an interesting problem on its own). Our approach is to cluster the stocks according to various measures (including several novel ones) and compare the results to the ”groundtruth” clustering based on the Standard and Poor 500 Index. Our experiments reveal several interesting facts about the similarity measures used for stock-market data.

read more

Citations
More filters
Journal ArticleDOI

Experiencing SAX: a novel symbolic representation of time series

TL;DR: The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.
Journal ArticleDOI

A review on time series data mining

TL;DR: The primary objective of this paper is to serve as a glossary for interested researchers to have an overall picture on the current time series data mining development and identify their potential research direction to further investigation.
Journal ArticleDOI

On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

TL;DR: The most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets support the claim that there is a need for a set oftime series benchmarks and more careful empirical evaluation in the data mining community.
Proceedings ArticleDOI

Towards parameter-free data mining

TL;DR: This work shows that recent results in bioinformatics and computational theory hold great promise for a parameter-free data-mining paradigm, and shows that this approach is competitive or superior to the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets.
Journal ArticleDOI

Selecting the right objective measure for association analysis

TL;DR: This paper describes several key properties one should examine in order to select the right measure for a given application and presents an algorithm for selecting a small set of patterns so that domain experts can find a measure that best fits their requirements by ranking this smallSet of patterns.
References
More filters
Book

Information Retrieval: Data Structures and Algorithms

TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.
Book ChapterDOI

Efficient Similarity Search In Sequence Databases

TL;DR: An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.
Proceedings ArticleDOI

Fast subsequence matching in time-series databases

TL;DR: An efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance.
Proceedings ArticleDOI

Fast and effective text mining using linear-time document clustering

TL;DR: An unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase, and a refinement to center adjustment, “vector average damping,” that further improves cluster quality.
Proceedings Article

Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

TL;DR: A new model of similarity of time sequences is introduced that captures the intuitive notion that two sequences should be considered similar if they have enough non-overlapping time-ordered pairs of subsequences thar are similar.