Home
/
Authors
/
Xiaoyue Wang

Author

Xiaoyue Wang

Bio: Xiaoyue Wang is an academic researcher from University of California, Riverside. The author has contributed to research in topics: Statistical classification & Automatic image annotation. The author has an hindex of 8, co-authored 14 publications receiving 2339 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Querying and mining of time series data: experimental comparison of representations and distance measures

[...]

Hui Ding¹, Goce Trajcevski¹, Peter Scheuermann¹, Xiaoyue Wang², Eamonn Keogh² - Show less +1 more•Institutions (2)

Northwestern University¹, University of California, Riverside²

01 Aug 2008

TL;DR: An extensive set of time series experiments are conducted re-implementing 8 different representation methods and 9 similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains to provide a unified validation of some of the existing achievements.

...read moreread less

Abstract: The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.

...read moreread less

1,387 citations

Journal Article•DOI•

Experimental comparison of representation methods and distance measures for time series data

[...]

Xiaoyue Wang¹, Abdullah Mueen¹, Hui Ding², Goce Trajcevski², Peter Scheuermann², Eamonn Keogh¹ - Show less +2 more•Institutions (2)

University of California, Riverside¹, Northwestern University²

01 Mar 2013-Data Mining and Knowledge Discovery

TL;DR: An extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains gives an overview of these different techniques and presents comparative experimental findings regarding their effectiveness.

...read moreread less

Abstract: The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this article, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.

...read moreread less

747 citations

Proceedings Article•

A Complexity-Invariant Distance Measure for Time Series.

[...]

Gustavo E. A. P. A. Batista¹, Xiaoyue Wang², Eamonn Keogh²•Institutions (2)

University of São Paulo¹, University of California, Riverside²

01 Jan 2011

TL;DR: This work introduces the first complexity-invariant distance measure for time series, and shows that it generally produces significant improvements in classification accuracy, and it is shown that this improvement does not compromise efficiency, since it can be lower bound and use a modification of triangular inequality.

...read moreread less

Abstract: The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping. In this work we make a surprising claim. There is an invariance that the community has missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where complex objects are incorrectly assigned to a simpler class. We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series classification experiments ever attempted, and show that complexity-invariant distance measures can produce improvements in accuracy in the vast majority of cases.

...read moreread less

310 citations

Posted Content•

Experimental Comparison of Representation Methods and Distance Measures for Time Series Data

[...]

Xiaoyue Wang, Hui Ding, Goce Trajcevski, Peter Scheuermann, Eamonn Keogh - Show less +1 more

09 Dec 2010-arXiv: Artificial Intelligence

TL;DR: In this article, the authors present a comparative experimental study of time series representations and similarity measures and their performance on thirty-eight time series data sets from a wide variety of application domains.

...read moreread less

Abstract: The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on thirty-eight time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.

...read moreread less

82 citations

Journal Article•DOI•

An efficient and effective similarity measure to enable data mining of petroglyphs

[...]

Qiang Zhu¹, Xiaoyue Wang¹, Eamonn Keogh¹, Sang-Hee Lee¹•Institutions (1)

University of California, Riverside¹

01 Jul 2011-Data Mining and Knowledge Discovery

TL;DR: This work introduces a novel distance measure and algorithms which allow efficient and effective data mining of large collections of rock art and identifies the reasons for this.

...read moreread less

Abstract: Rock art is an archaeological term for human-made markings on stone, including carved markings, known as petroglyphs, and painted markings, known as pictographs. It is believed that there are millions of petroglyphs in North America alone, and the study of this valued cultural resource has implications even beyond anthropology and history. Surprisingly, although image processing, information retrieval and data mining have had a large impact on many human endeavors, they have had essentially zero impact on the study of rock art. In this work we identify the reasons for this, and introduce a novel distance measure and algorithms which allow efficient and effective data mining of large collections of rock art.

...read moreread less

34 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A review on time series data mining

[...]

Tak-chung Fu¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Feb 2011-Engineering Applications of Artificial Intelligence

TL;DR: The primary objective of this paper is to serve as a glossary for interested researchers to have an overall picture on the current time series data mining development and identify their potential research direction to further investigation.

...read moreread less

1,358 citations

Book•

Outlier Analysis

[...]

Charu C. Aggarwal

11 Jan 2013

TL;DR: Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit.

...read moreread less

Abstract: With the increasing advances in hardware technology for data collection, and advances in software technology (databases) for data organization, computer scientists have increasingly participated in the latest advancements of the outlier analysis field. Computer scientists, specifically, approach this field based on their practical experiences in managing large amounts of data, and with far fewer assumptions the data can be of any type, structured or unstructured, and may be extremely large. Outlier Analysisis a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists. The book has been organized carefully, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit. Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis are covered.

...read moreread less

1,278 citations

Journal Article•DOI•

Time-series clustering - A decade review

[...]

Saeed Aghabozorgi¹, Ali Seyed Shirkhorshidi¹, Teh Ying Wah¹•Institutions (1)

Information Technology University¹

01 Oct 2015-Information Systems

TL;DR: This review will expose four main components of time-series clustering and is aimed to represent an updated investigation on the trend of improvements in efficiency, quality and complexity of clustering time- series approaches during the last decade and enlighten new paths for future works.

...read moreread less

1,235 citations

Journal Article•DOI•

The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances

[...]

Anthony J. Bagnall¹, Jason Lines¹, Aaron Bostrom¹, James Large¹, Eamonn Keogh² - Show less +1 more•Institutions (2)

University of East Anglia¹, University of California, Riverside²

01 May 2017-Data Mining and Knowledge Discovery

TL;DR: This work implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets, indicating that only nine of these algorithms are significantly more accurate than both benchmarks.

...read moreread less

Abstract: In the last 5 years there have been a large number of new time series classification algorithms proposed in the literature. These algorithms have been evaluated on subsets of the 47 data sets in the University of California, Riverside time series classification archive. The archive has recently been expanded to 85 data sets, over half of which have been donated by researchers at the University of East Anglia. Aspects of previous evaluations have made comparisons between algorithms difficult. For example, several different programming languages have been used, experiments involved a single train/test split and some used normalised data whilst others did not. The relaunch of the archive provides a timely opportunity to thoroughly evaluate algorithms on a larger number of datasets. We have implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets. We use these results to test several hypotheses relating to whether the algorithms are significantly more accurate than the benchmarks and each other. Our results indicate that only nine of these algorithms are significantly more accurate than both benchmarks and that one classifier, the collective of transformation ensembles, is significantly more accurate than all of the others. All of our experiments and results are reproducible: we release all of our code, results and experimental details and we hope these experiments form the basis for more robust testing of new algorithms in the future.

...read moreread less

1,070 citations

Proceedings Article•DOI•

Patterns of temporal variation in online media

[...]

Jaewon Yang¹, Jure Leskovec¹•Institutions (1)

Stanford University¹

09 Feb 2011

TL;DR: This work develops the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with the authors' similarity measure and presents a simple model that reliably predicts the shape of attention by using information about only a small number of participants.

...read moreread less

Abstract: Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored.We study temporal patterns associated with online content and how the content's popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets.We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on theWeb and broaden the understanding of the dynamics of human attention.

...read moreread less

1,041 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse