scispace - formally typeset
Search or ask a question
Author

Vincent Oria

Other affiliations: University of Alberta
Bio: Vincent Oria is an academic researcher from New Jersey Institute of Technology. The author has contributed to research in topics: Query optimization & Image retrieval. The author has an hindex of 17, co-authored 98 publications receiving 2092 citations. Previous affiliations of Vincent Oria include University of Alberta.


Papers
More filters
Proceedings ArticleDOI
14 Jun 2005
TL;DR: Analysis and comparison of EDR with other popular distance functions, such as Euclidean distance, Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), and Longest Common Subsequences, indicate that EDR is more robust than Euclideans distance, DTW and ERP, and it is on average 50% more accurate than LCSS.
Abstract: An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance functions, such as Euclidean distance, Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), and Longest Common Subsequences (LCSS), indicate that EDR is more robust than Euclidean distance, DTW and ERP, and it is on average 50% more accurate than LCSS. We also develop three pruning techniques to improve the retrieval efficiency of EDR and show that these techniques can be combined effectively in a search, increasing the pruning power significantly. The experimental results confirm the superior efficiency of the combined methods.

1,225 citations

01 Jan 2000
TL;DR: A general multimedia query language, called MOQL, based on ODMG's Object Query Language (OQL), which includes constructs to capture the temporal and spatial relationships in multimedia data as well as functions for query presentation.
Abstract: We describe a general multimedia query language, called MOQL, based on ODMG's Object Query Language (OQL). In contrast to previous multimedia query languages that are either designed for one particular medium (e.g. images) or specialized for a particular application (e.g., medical imaging), MOQL is general in its treatment of multiple media and di erent applications. The language includes constructs to capture the temporal and spatial relationships in multimedia data as well as functions for query presentation. We illustrate the language features by query examples. The language is implemented for a multimedia database built on top of ObjectStore.

76 citations

Proceedings ArticleDOI
15 Oct 2004
TL;DR: This paper proposes a novel representation of trajectories, called movement pattern strings, which convert the trajectories into symbolic representations, and defines a modified frequency distance for frequency vectors obtained from movement pattern strings to reduce the dimensionality and the computation cost.
Abstract: Searching moving object trajectories of video databases has been applied to many fields, such as video data analysis, content-based video retrieval, video scene classification. In this paper, we propose a novel representation of trajectories, called movement pattern strings, which convert the trajectories into symbolic representations. Movement pattern strings encode both the movement direction and the movement distance information of the trajectories. The distances that are computed in a symbolic space are lower bounds of the distances of original trajectory data, which guarantees that no false dismissals will be introduced using movement pattern strings to retrieve trajectories. In order to improve the retrieval efficiency, we define a modified frequency distance for frequency vectors that are obtained from movement pattern strings to reduce the dimensionality and the computation cost. The experimental results show that using movement pattern strings is almost as effective as using raw trajectories. In addition, the cost of retrieving similar trajectories can greatly be reduced when the modified frequency distance is used as a filter

69 citations

Journal ArticleDOI
01 Oct 2011
TL;DR: This paper proposes T-PARINET, an access method to efficiently retrieve the trajectories of objects moving in networks, which significantly outperforms the reference R-tree-based access methods for in-network trajectory databases.
Abstract: Indexing moving objects (MO) is a hot topic in the field of moving objects databases since many years. An impressive number of access methods have been proposed to optimize the processing of MO-related queries. Several methods have focused on spatio-temporal range queries, which represent the foundation of MO trajectory queries. Surprisingly, only a few of them consider that the objects movements are constrained. This is an important aspect for several reasons ranging from better capturing the relationship between the trajectory and the network space to more accurate trajectory representation with lower storage requirements. In this paper, we propose T-PARINET, an access method to efficiently retrieve the trajectories of objects moving in networks. T-PARINET is designed for continuous indexing of trajectory data flows. The cornerstone of T-PARINET is PARINET, an efficient index for historical trajectory data. The structure of PARINET is based on a combination of graph partitioning and a set of composite B+-tree local indexes. Because the network can be modeled using graphs, the partitioning of the trajectory data makes use of graph partitioning theory and can be tuned for a given query load and a given data distribution in the network space. The tuning process is built on a good quality cost model that is supplied with PARINET. The advantage of having a cost model is twofold; it allows a better integration of the index into the query optimizer of any DBMS, and it permits tuning the index structure for better performance. The tuning process can be performed before the index creation in the case of historical data or online in the case of indexing data flows. In fact, massive online updates can degrade the index quality, which can be measured by the cost model. We propose a specific maintenance process that results into T-PARINET. We study different types of queries and provide an optimized configuration for several scenarios. T-PARINET can easily be integrated into any RDBMS, which is an essential asset particularly for industrial or commercial applications. The experimental evaluation under an off-the-shelf DBMS shows that our method is robust. It also significantly outperforms the reference R-tree-based access methods for in-network trajectory databases.

63 citations

Journal ArticleDOI
TL;DR: An extended data model and a network partitioning algorithm into long paths to increase the compression rates for the same error bound are proposed and integrated with the state-of-the-art Douglas-Peucker compression algorithm to obtain a new technique to compress road network trajectory data with deterministic error bounds.
Abstract: With the proliferation of wireless communication devices integrating GPS technology, trajectory datasets are becoming more and more available. The problems concerning the transmission and the storage of such data have become prominent with the continuous increase in volume of these data. A few works in the field of moving object databases deal with spatio-temporal compression. However, these works only consider the case of objects moving freely in the space. In this paper, we tackle the problem of compressing trajectory data in road networks with deterministic error bounds. We analyze the limitations of the existing methods and data models for road network trajectory compression. Then, we propose an extended data model and a network partitioning algorithm into long paths to increase the compression rates for the same error bound. We integrate these proposals with the state-of-the-art Douglas-Peucker compression algorithm to obtain a new technique to compress road network trajectory data with deterministic error bounds. The extensive experimental results confirm the appropriateness of the proposed approach that exhibits compression rates close to the ideal ones with respect to the employed Douglas-Peucker compression algorithm.

58 citations


Cited by
More filters
Journal Article
TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.
Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

1,992 citations

Journal ArticleDOI
TL;DR: The class of point access methods, which are used to search sets of points in two or more dimensions, are presented and a discussion of theoretical and experimental results concerning the relative performance of various approaches are discussed.
Abstract: Search operations in databases require special support at the physical level. This is true for conventional databases as well as spatial databases, where typical search operations include the point query (find all objects that contain a given search point) and the region query (find all objects that overlap a given search region). More than ten years of spatial database research have resulted in a great variety of multidimensional access methods to support such operations. We give an overview of that work. After a brief survey of spatial data management in general, we first present the class of point access methods, which are used to search sets of points in two or more dimensions. The second part of the paper is devoted to spatial access methods to handle extended objects, such as rectangles or polyhedra. We conclude with a discussion of theoretical and experimental results concerning the relative performance of various approaches.

1,758 citations

Journal ArticleDOI
01 Aug 2008
TL;DR: An extensive set of time series experiments are conducted re-implementing 8 different representation methods and 9 similarity measures and their variants and testing their effectiveness on 38 time series data sets from a wide variety of application domains to provide a unified validation of some of the existing achievements.
Abstract: The last decade has witnessed a tremendous growths of interests in applications that deal with querying and mining of time series data. Numerous representation methods for dimensionality reduction and similarity measures geared towards time series have been introduced. Each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive set of time series experiments re-implementing 8 different representation methods and 9 similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this paper, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. Our experiments have provided both a unified validation of some of the existing achievements, and in some cases, suggested that certain claims in the literature may be unduly optimistic.

1,387 citations

Proceedings ArticleDOI
11 Jun 2007
TL;DR: A new partition-and-group framework for clustering trajectories is proposed, which partitions a trajectory into a set of line segments, and then, groups similar line segments together into a cluster, and a trajectory clustering algorithm TRACLUS is developed, which discovers common sub-trajectories from real trajectory data.
Abstract: Existing trajectory clustering algorithms group similar trajectories as a whole, thus discovering common trajectories. Our key observation is that clustering trajectories as a whole could miss common sub-trajectories. Discovering common sub-trajectories is very useful in many applications, especially if we have regions of special interest for analysis. In this paper, we propose a new partition-and-group framework for clustering trajectories, which partitions a trajectory into a set of line segments, and then, groups similar line segments together into a cluster. The primary advantage of this framework is to discover common sub-trajectories from a trajectory database. Based on this partition-and-group framework, we develop a trajectory clustering algorithm TRACLUS. Our algorithm consists of two phases: partitioning and grouping. For the first phase, we present a formal trajectory partitioning algorithm using the minimum description length(MDL) principle. For the second phase, we present a density-based line-segment clustering algorithm. Experimental results demonstrate that TRACLUS correctly discovers common sub-trajectories from real trajectory data.

1,387 citations

Journal ArticleDOI
TL;DR: The primary objective of this paper is to serve as a glossary for interested researchers to have an overall picture on the current time series data mining development and identify their potential research direction to further investigation.

1,358 citations