Home
/
Authors
/
Michail Vlachos

Author

Michail Vlachos

Other affiliations: University of California, University of California, Riverside, AT&T ...read more

Bio: Michail Vlachos is an academic researcher from IBM. The author has contributed to research in topics: Cluster analysis & Euclidean distance. The author has an hindex of 30, co-authored 115 publications receiving 5400 citations. Previous affiliations of Michail Vlachos include University of California & University of California, Riverside.

Papers published on a yearly basis

2022
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

[...]

Michail Vlachos¹, George Kollios², Dimitrios Gunopulos¹•Institutions (2)

University of California, Riverside¹, Boston University²

26 Feb 2002

TL;DR: This work formalizes non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences.

...read moreread less

Abstract: We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.

...read moreread less

1,504 citations

Proceedings Article•DOI•

Indexing multi-dimensional time-series with support for multiple distance measures

[...]

Michail Vlachos¹, Marios Hadjieleftheriou¹, Dimitrios Gunopulos¹, Eamonn Keogh¹•Institutions (1)

University of California, Riverside¹

24 Aug 2003

TL;DR: The experimental results demonstrate that the index motivated by the need for a single index structure that can support multiple distance measures can help speed-up the computation of expensive similarity measures such as the LCSS and the DTW.

...read moreread less

Abstract: Although most time-series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need for a single index structure that can support multiple distance measures. Our specific area of interest is the efficient retrieval and analysis of trajectory similarities. Trajectory datasets are very common in environmental applications, mobility experiments, video surveillance and are especially important for the discovery of certain biological patterns. Our primary similarity measure is based on the Longest Common Subsequence (LCSS) model, that offers enhanced robustness, particularly for noisy data, which are encountered very often in real world applications. However, our index is able to accommodate other distance measures as well, including the ubiquitous Euclidean distance, and the increasingly popular Dynamic Time Warping (DTW). While other researchers have advocated one or other of these similarity measures, a major contribution of our work is the ability to support all these measures without the need to restructure the index. Our framework guarantees no false dismissals and can also be tailored to provide much faster response time at the expense of slightly reduced precision/recall. The experimental results demonstrate that our index can help speed-up the computation of expensive similarity measures such as the LCSS and the DTW.

...read moreread less

419 citations

Proceedings Article•DOI•

[...]

Michail Vlachos¹, Christopher Meek¹, Zografoula Vagena¹, Dimitrios Gunopulos¹•Institutions (1)

University of California, Riverside¹

13 Jun 2004

TL;DR: This work presents several methods for mining knowledge from the query logs of the MSN search engine, and proposes a simple but effective method for identification of bursts (long or short-term).

...read moreread less

Abstract: We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., 'Thanksgiving' or 'Christmas gifts') where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the state-of-the-art in time-series similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a time-series. Finally we propose a simple but effective method for identification of bursts (long or short-term). Using the burst information extracted from a sequence, we are able to efficiently perform 'query-by-burst' on the database of time-series. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database.

...read moreread less

287 citations

Proceedings Article•DOI•

On Periodicity Detection and Structural Periodic Similarity

[...]

Michail Vlachos¹, Philip S. Yu, Vittorio Castelli¹•Institutions (1)

IBM¹

09 Jan 2005

TL;DR: Non-parametric methods for accurate periodicity detection and new periodic distance measures for time-series sequences are presented and introduced to assist in detecting, monitoring and visualizing structural periodic changes.

...read moreread less

Abstract: This work motivates the need for more flexible structural similarity measures between time-series sequences, which are based on the extraction of important periodic features. Specifically, we present non-parametric methods for accurate periodicity detection and we introduce new periodic distance measures for time-series sequences. The goal of these tools and techniques are to assist in detecting, monitoring and visualizing structural periodic changes. It is our belief that these methods can be directly applicable in the manufacturing industry for preventive maintenance and in the medical sciences for accurate classification and anomaly detection.

...read moreread less

237 citations

Proceedings Article•DOI•

LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures

[...]

Eamonn Keogh¹, Li Wei¹, Xiaopeng Xi¹, Sang-Hee Lee¹, Michail Vlachos¹ - Show less +1 more•Institutions (1)

University of California, Riverside¹

01 Sep 2006

TL;DR: This work can take current approaches and make them four orders of magnitude faster, without false dismissals, and is used with any of the dozens of existing shape representations and with all the most popular distance measures including Euclidean distance, Dynamic Time Warping and Longest Common Subsequence.

...read moreread less

Abstract: The matching of two-dimensional shapes is an important problem with applications in domains as diverse as biometrics, industry, medicine and anthropology. The distance measure used must be invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortions are relatively easy to handle, either in the representation of the data or in the similarity measure used. However rotation invariance seems to be uniquely difficult. Current approaches typically try to achieve rotation invariance in the representation of the data, at the expense of discrimination ability, or in the distance measure, at the expense of efficiency. In this work we show that we can take the slow but accurate approaches and dramatically speed them up. On real world problems our technique can take current approaches and make them four orders of magnitude faster, without false dismissals. Moreover, our technique can be used with any of the dozens of existing shape representations and with all the most popular distance measures including Euclidean distance, Dynamic Time Warping and Longest Common Subsequence.

...read moreread less

223 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse

Cited by

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Journal Article•DOI•

Anomaly detection: A survey

[...]

Varun Chandola¹, Arindam Banerjee¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

30 Jul 2009-ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

...read moreread less

9,627 citations

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

Journal Article•DOI•

Clustering of time series data-a survey

[...]

T. Warren Liao¹•Institutions (1)

Louisiana State University¹

01 Nov 2005-Pattern Recognition

TL;DR: This paper surveys and summarizes previous works that investigated the clustering of time series data in various application domains, including general-purpose clustering algorithms commonly used in time series clustering studies.

...read moreread less

2,336 citations

Matrix Factorization Techniques for Recommender Systems

[...]

Patrick Seemann

01 Jan 2014

2,080 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse