Mining the stock market (extended abstract): which measure is best?

doi:10.1145/347090.347189

Proceedings ArticleDOI

Mining the stock market (extended abstract): which measure is best?

- pp 487-496

TLDR

The approach is to cluster the stocks according to various measures and compare the results to the ”groundtruth” clustering based on the Standard and Poor 500 Index and reveal several interesting facts about the similarity measures used for stock-market data.

Abstract:

In recent years, there has been a lot of interest in the database community in mining time series data. Surprisingly, little work has been done on verifying which measures are most suitable for mining of a given class of data sets. Such work is of crucial importance, since it enables us to identify similarity measures which are useful in a given context and therefore for which efficient algorithms should be further investigated. Moreover, an accurate evaluation of the performance of even existing algorithms is not possible without a good understanding of the data sets occurring in practice. In this work we attempt to fill this gap by studying similarity measures for clustering of similar stocks (which, of course, is an interesting problem on its own). Our approach is to cluster the stocks according to various measures (including several novel ones) and compare the results to the ”groundtruth” clustering based on the Standard and Poor 500 Index. Our experiments reveal several interesting facts about the similarity measures used for stock-market data.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Experiencing SAX: a novel symbolic representation of time series

Jessica Lin, +3 more

- 01 Oct 2007 -

Data Mining and Knowledge Discovery

TL;DR: The utility of the new symbolic representation of time series formed is demonstrated, which allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic approach that lower bound corresponding distance measuresdefined on the original series.

...read moreread less

Journal ArticleDOI

A review on time series data mining

Tak-chung Fu

- 01 Feb 2011 -

Engineering Applications of Artificial I...

TL;DR: The primary objective of this paper is to serve as a glossary for interested researchers to have an overall picture on the current time series data mining development and identify their potential research direction to further investigation.

...read moreread less

Journal ArticleDOI

On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Eamonn Keogh, +1 more

- 01 Oct 2003 -

Data Mining and Knowledge Discovery

TL;DR: The most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets support the claim that there is a need for a set oftime series benchmarks and more careful empirical evaluation in the data mining community.

...read moreread less

Proceedings ArticleDOI

Towards parameter-free data mining

Eamonn Keogh, +2 more

TL;DR: This work shows that recent results in bioinformatics and computational theory hold great promise for a parameter-free data-mining paradigm, and shows that this approach is competitive or superior to the state-of-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets.

...read moreread less

Journal ArticleDOI

Selecting the right objective measure for association analysis

Pang-Ning Tan, +2 more

TL;DR: This paper describes several key properties one should examine in order to select the right measure for a given application and presents an algorithm for selecting a small set of patterns so that domain experts can find a measure that best fits their requirements by ranking this smallSet of patterns.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Information Retrieval: Data Structures and Algorithms

William B. Frakes, +1 more

TL;DR: For programmers and students interested in parsing text, automated indexing, its the first collection in book form of the basic data structures and algorithms that are critical to the storage and retrieval of documents.

...read moreread less

Book ChapterDOI

Efficient Similarity Search In Sequence Databases

Rakesh Agrawal, +2 more

TL;DR: An indexing method for time sequences for processing similarity queries using R * -trees to index the sequences and efficiently answer similarity queries and provides experimental results which show that the method is superior to search based on sequential scanning.

...read moreread less

Proceedings ArticleDOI

Fast subsequence matching in time-series databases

Christos Faloutsos, +2 more

TL;DR: An efficient indexing method to locate 1-dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance.

...read moreread less

Proceedings ArticleDOI

Fast and effective text mining using linear-time document clustering

Bjornar Larsen, +1 more

TL;DR: An unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase, and a refinement to center adjustment, “vector average damping,” that further improves cluster quality.

...read moreread less

Proceedings Article

Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

Rakesh Agrawal, +3 more

TL;DR: A new model of similarity of time sequences is introduced that captures the intuitive notion that two sequences should be considered similar if they have enough non-overlapping time-ordered pairs of subsequences thar are similar.

...read moreread less

Mining the stock market (extended abstract): which measure is best?

Citations

Experiencing SAX: a novel symbolic representation of time series

A review on time series data mining

On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Towards parameter-free data mining

Selecting the right objective measure for association analysis

References

Information Retrieval: Data Structures and Algorithms

Efficient Similarity Search In Sequence Databases

Fast subsequence matching in time-series databases

Fast and effective text mining using linear-time document clustering

Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

Related Papers (5)

Efficient Similarity Search In Sequence Databases

Clustering of time series data-a survey

Fast subsequence matching in time-series databases

Data Mining: Concepts and Techniques

Finding Groups in Data: An Introduction to Cluster Analysis