Experiencing SAX: a novel symbolic representation of time series
read more
Citations
Querying and mining of time series data: experimental comparison of representations and distance measures
A review on time series data mining
Time-series clustering - A decade review
Time series classification from scratch with deep neural networks: A strong baseline
The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances
References
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
Using dynamic time warping to find patterns in time series
A symbolic representation of time series, with implications for streaming algorithms
Fast subsequence matching in time-series databases
Related Papers (5)
Frequently Asked Questions (15)
Q2. What future works have the authors mentioned in the paper "Experiencing sax: a novel symbolic representation of time series" ?
A host of future directions suggest themselves. There is an enormous wealth of useful definitions, algorithms and data structures in the bioinformatics literature that can be exploited by their representation ( Apostolico et al. It may be possible to create a lower bounding approximation of Dynamic Time Warping ( Berndt and Clifford 1994 ), by slightly modifying the classic string edit distance. Finally, there may be utility in extending their work to multidimensional and streaming time series ( Vlachos et al. 2002 ).
Q3. Why did the authors exclude them from the rest of the classification experiments?
Since both IMPACTS and SDA perform poorly compared to Euclidean distance and SAX, the authors will exclude them from the rest of the classification experiments.
Q4. How do the authors compare the objective function of k-means?
Since k-means algorithm seeks to optimize the objective function, by minimizing the sum of squared intra-cluster error,we compare and plot the objective functions, after projecting the data back to its original dimension (for fair comparison of objective functions), for each iteration.
Q5. What are some examples of tools that are not defined for real-valued sequences?
Some simple examples of “tools” that are not defined for real-valued sequences but are defined for symbolic approaches include hashing, Markov models, suffix trees, decision trees, etc.
Q6. What is the way to compare and contrast similarity measures?
Comparing hierarchical clusterings is a very good way to compare and contrast similarity measures, since a dendrogram of size N summarizes O(N2) distance calculations (Keogh and Kasetty 2002).
Q7. How can one guarantee retrieving the full answer set?
It is only by using a lower bounding technique that one can guarantee retrieving the full answer set, with no false dismissals (Faloutsos et al. 1994).
Q8. What is the way to test the sanity of hierarchical clustering?
Although hierarchical clustering is a good sanity check for any proposed distance measure, it has limited utility for data mining because of its poor scalability.
Q9. What is the key observation that allowed us to prove lower bounds?
The key observation that allowed us to prove lower bounds is to concentrate on proving that the symbolic distance measure lower bounds the PAA distance measure.
Q10. How many bits per word is needed for the original time series?
The compression ratio (last column of next table) is calculated as: w × ⌈log2 a ⌉/n × 32, because for SAX representation the authors only need ⌈log2 a ⌉bits per word, while for the original time series the authors need 4 bytes (32 bits) for each value.
Q11. What are the reasons why SDA and IMPACTS perform poorly?
The reasons that SDA and IMPACTS perform poorly, the authors observe, are that neither symbolic representation is very descriptive of the general shape of the time series, and that the lack of dimensionality reduction can further distort the results if the data is noisy.
Q12. What is the frequency of occurrence for each pattern?
Each string is regarded as a pattern, and the frequency of occurrence for each pattern is encoded by the thickness of the branch: the thicker the branch, the more frequent the corresponding pattern.
Q13. What is the commonly used data mining algorithm?
The most commonly used data mining clustering algorithm is k-means (Fayyad et al. 1998), so for completeness the authors will consider it here.
Q14. What is the difference between SAX and other real-valued approaches?
SAXrepresentation can afford to have higher dimensionality than the other real-valued approaches, while using less or the same amount of space.
Q15. What is the difference between SAX and the other representations?
Note that since SAX is a symbolic representation, the alphabets can be stored as bits rather than doubles, which results in a considerable amount of space-saving.