Home
/
Authors
/
Vassilis J. Tsotras

Author

Vassilis J. Tsotras

Other affiliations: New York University, University of California

Bio: Vassilis J. Tsotras is an academic researcher from University of California, Riverside. The author has contributed to research in topics: XML & Search engine indexing. The author has an hindex of 43, co-authored 200 publications receiving 6315 citations. Previous affiliations of Vassilis J. Tsotras include New York University & University of California.

Topics: XML, Search engine indexing, Efficient XML Interchange, Big data, XML schema ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1993
1990

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

On indexing mobile objects

[...]

George Kollios¹, Dimitrios Gunopulos¹, Vassilis J. Tsotras¹•Institutions (1)

University of California, Riverside¹

01 May 1999

TL;DR: A lower bound on the number of I/O’s needed to answer the d-dimensional problem is given and a practical approximation algorithm also in the dynamic, external memory setting, which has linear space and expected logarithmic query time is given.

...read moreread less

Abstract: We show how to index mobile objects in one and two dimensions using efficient dynamic external memory data structures. The problem is motivated by real life applications in traffic monitoring, intelligent navigation and mobile communications domains. For the l-dimensional case, we give (i) a dynamic, external memory algorithm with guaranteed worst case performance and linear space and (ii) a practical approximation algorithm also in the dynamic, external memory setting, which has linear space and expected logarithmic query time. We also give an algorithm with guaranteed logarithmic query time for a restricted version of the problem. We present extensions of our techniques to two dimensions. In addition we give a lower bound on the number of I/O’s needed to answer the d-dimensional problem. Initial experimental results and comparisons to traditional indexing approaches are also included.

...read moreread less

413 citations

Journal Article•DOI•

Comparison of access methods for time-evolving data

[...]

Betty Salzberg¹, Vassilis J. Tsotras²•Institutions (2)

Northeastern University¹, University of California, Riverside²

01 Jun 1999-ACM Computing Surveys

TL;DR: This paper compares different indexing techniques proposed for supporting efficient access to temporal data based on a collection of important performance criteria, including the space consumed, update processing, and query time for representative queries.

...read moreread less

Abstract: This paper compares different indexing techniques proposed for supporting efficient access to temporal data. The comparison is based on a collection of important performance criteria, including the space consumed, update processing, and query time for representative queries. The comparison is based on worst-case analysis, hence no assumptions on data distribution or query frequencies are made. When a number of methods have the same asymptotic worst-case behavior, features in the methods that affect average case behavior are discussed. Additional criteria examined are the pagination of an index, the ability to cluster related data together, and the ability to efficiently separate old from current data (so that larger archival storage media such as write-once optical disks can be used). The purpose of the paper is to identify the difficult problems in accessing temporal data and describe how the different methods aim to solve them. A general lower bound for answering basic temporal queries is also introduced.

...read moreread less

364 citations

Book Chapter•DOI•

Efficient structural joins on indexed XML documents

[...]

Shu-Yao Chien¹, Zografoula Vagena², Donghui Zhang², Vassilis J. Tsotras², Carlo Zaniolo¹ - Show less +1 more•Institutions (2)

University of California, Los Angeles¹, University of California, Riverside²

20 Aug 2002

TL;DR: This paper proposes efficient structural join algorithms in the presence of tag indices using B+- trees and an enhancement based on sibling pointers that further improves performance, and presents a structural join algorithm that utilizes R-trees.

...read moreread less

Abstract: Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, parent-child and ancestor-descendant relationships. Efficient support for structural joins is thus the key to efficient implementations of XML queries. Recently proposed node numbering schemes enable the capturing of the XML document structure using traditional indices (such as B+-trees or R-trees). This paper proposes efficient structural join algorithms in the presence of tag indices. We first concentrate on using B+- trees and show how to expedite a structural join by avoiding collections of elements that do not participate in the join. We then introduce an enhancement (based on sibling pointers) that further improves performance. Such sibling pointers are easily implemented and dynamically maintainable. We also present a structural join algorithm that utilizes R-trees. An extensive experimental comparison shows that the B+-tree structural joins are more robust. Furthermore, they provide drastic improvement gains over the current state of the art.

...read moreread less

312 citations

Journal Article•DOI•

Approximating multi-dimensional aggregate range queries over real attributes

[...]

Dimitrios Gunopulos¹, George Kollios, Vassilis J. Tsotras¹, Carlotta Domeniconi¹•Institutions (1)

University of California, Riverside¹

16 May 2000

TL;DR: A new histogram technique is presented that is designed to approximate the density of multi-dimensional datasets with real attributes, and finds buckets of variable size, and allows the buckets to overlap to lead to a faster and more compact approximation of the data distribution.

...read moreread less

Abstract: Finding approximate answers to multi-dimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper we consider the following problem: given a table of d attributes whose domain is the real numbers, and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query.We present a new histogram technique that is designed to approximate the density of multi-dimensional datasets with real attributes. Our technique finds buckets of variable size, and allows the buckets to overlap. Overlapping buckets allow more efficient approximation of the density. The size of the cells is based on the local density of the data. This technique leads to a faster and more compact approximation of the data distribution. We also show how to generalize kernel density estimators, and how to apply them on the multi-dimensional query approximation problem.Finally, we compare the accuracy of the proposed techniques with existing techniques using real and synthetic datasets.

...read moreread less

194 citations

Journal Article•DOI•

AsterixDB: a scalable, open source BDMS

[...]

Sattam Alsubaiee¹, Yasser Altowim¹, Hotham Altwaijry¹, Alexander Behm², Vinayak Borkar¹, Yingyi Bu¹, Michael J. Carey¹, Inci Cetindil¹, Madhusudan Cheelangi³, Khurram Faraaz⁴, Eugenia Gabrielova¹, Raman Grover¹, Zachary Heilbron¹, Young-Seok Kim¹, Chen Li¹, Guangqiang Li, Ji Mahn Ok¹, Nicola Onose, Pouria Pirzadeh¹, Vassilis J. Tsotras⁵, Rares Vernica⁶, Jian Wen⁷, Till Westmann⁷ - Show less +19 more•Institutions (7)

University of California, Irvine¹, Cloudera², Google³, IBM⁴, University of California, Riverside⁵, Hewlett-Packard⁶, Oracle Corporation⁷

01 Oct 2014

TL;DR: AsterixDB as mentioned in this paper is a full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem.

...read moreread less

Abstract: AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store.Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements.

...read moreread less

185 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

Journal Article•DOI•

Statistics for Spatial Data.

[...]

Andrew B. Lawson¹, Noel A Cressie•Institutions (1)

University of Dundee¹

01 Mar 1993-The Statistician

6,278 citations

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

Journal Article•

When is nearest neighbor meaningful

[...]

Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft

01 Jan 1999-Lecture Notes in Computer Science

TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.

...read moreread less

Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

...read moreread less

1,992 citations

Proceedings Article•DOI•

[...]

Michail Vlachos¹, George Kollios², Dimitrios Gunopulos¹•Institutions (2)

University of California, Riverside¹, Boston University²

26 Feb 2002

TL;DR: This work formalizes non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences.

...read moreread less

Abstract: We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount of noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translation of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and time warping distance functions (for real and synthetic data) and show the superiority of our approach, especially in the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach.

...read moreread less

1,504 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse