Author
Nikos Mamoulis
Other affiliations: University of Hong Kong, Max Planck Society, University of California, Riverside ...read more
Bio: Nikos Mamoulis is an academic researcher from University of Ioannina. The author has contributed to research in topics: Joins & Spatial query. The author has an hindex of 56, co-authored 282 publications receiving 11121 citations. Previous affiliations of Nikos Mamoulis include University of Hong Kong & Max Planck Society.
Papers published on a yearly basis
Papers
More filters
••
18 Dec 2006TL;DR: This work formally defines the problem of mining collocation episodes and proposes two scaleable algorithms for its efficient solution and empirically evaluates the performance of the proposed methods using synthetically generated data that emulate real-world object movements.
Abstract: Given a collection of trajectories of moving objects with different types (e.g., pumas, deers, vultures, etc.), we introduce the problem of discovering collocation episodes in them (e.g., if a puma is moving near a deer, then a vulture is also going to move close to the same deer with high probability within the next 3 minutes). Collocation episodes catch the inter-movement regularities among different types of objects. We formally define the problem of mining collocation episodes and propose two scaleable algorithms for its efficient solution. We empirically evaluate the performance of the proposed methods using synthetically generated data that emulate real-world object movements.
61 citations
••
01 Nov 2013
TL;DR: In this article, the authors address probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model and propose a sampling approach which uses Bayesian inference to guarantee that sampled trajectories conform to the observation data stored in the database.
Abstract: Nearest neighbor (NN) queries in trajectory databases have received significant attention in the past, due to their applications in spatio-temporal data analysis More recent work has considered the realistic case where the trajectories are uncertain; however, only simple uncertainty models have been proposed, which do not allow for accurate probabilistic search In this paper, we fill this gap by addressing probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model We study three nearest neighbor query semantics that take as input a query state or trajectory q and a time interval, and theoretically evaluate their runtime complexity Furthermore we propose a sampling approach which uses Bayesian inference to guarantee that sampled trajectories conform to the observation data stored in the database This sampling approach can be used in Monte-Carlo based approximation solutions We include an extensive experimental study to support our theoretical results
61 citations
••
05 Apr 2005
TL;DR: Algorithms and optimization techniques for RNN queries are proposed by utilizing some characteristics of networks to solve reverse nearest neighbor queries in large graphs.
Abstract: A reverse nearest neighbor query returns the data objects that have a query point as their nearest neighbor. Although such queries have been studied quite extensively in Euclidean spaces, there is no previous work in the context of large graphs. In this paper, we propose algorithms and optimization techniques for RNN queries by utilizing some characteristics of networks.
60 citations
••
TL;DR: This paper presents an optimal distributed algorithm to adapt the placement of a single operator in high communication cost networks, such as a wireless sensor network, and is the first optimal and distributed algorithms to solve the 1-median (Fermat node) problem.
58 citations
••
06 Nov 2017TL;DR: This work model the fault-tolerant subspace clustering problem as a search problem on graphs and presents an algorithm, GraphRec, based on the concept of α-ß-core, which is extremely fast compared to the state-of-the-art.
Abstract: Fault-tolerant group recommendation systems based on subspace clustering successfully alleviate high-dimensionality and sparsity problems. However, the cost of recommendation grows exponentially with the size of dataset. To address this issue, we model the fault-tolerant subspace clustering problem as a search problem on graphs and present an algorithm, GraphRec, based on the concept of α-s-core. Moreover, we propose two variants of our approach that use indexes to improve query latency. Our experiments on different datasets demonstrate that our methods are extremely fast compared to the state-of-the-art.
50 citations
Cited by
More filters
01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.
4,833 citations
01 Jan 2006
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.
2,591 citations
•
TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.
Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.
1,992 citations