scispace - formally typeset
Search or ask a question

Showing papers by "Nikos Mamoulis published in 2014"


Proceedings ArticleDOI
18 Jun 2014
TL;DR: This paper shows how the density-based clustering paradigm can be extended to apply on places which are visited by users of a geo-social network, and designs two quantitative measures, called social entropy and community score, to evaluate the quality of the discovered clusters.
Abstract: Spatial clustering deals with the unsupervised grouping of places into clusters and finds important applications in urban planning and marketing. Current spatial clustering models disregard information about the people who are related to the clustered places. In this paper, we show how the density-based clustering paradigm can be extended to apply on places which are visited by users of a geo-social network. Our model considers both spatial information and the social relationships between users who visit the clustered places. After formally defining the model and the distance measure it relies on, we present efficient algorithms for its implementation, based on spatial indexing. We evaluate the effectiveness of our model via a case study on real data; in addition, we design two quantitative measures, called social entropy and community score to evaluate the quality of the discovered clusters. The results show that geo-social clusters have special properties and cannot be found by applying simple spatial clustering approaches. The efficiency of our index-based implementation is also evaluated experimentally.

72 citations


Proceedings ArticleDOI
24 Aug 2014
TL;DR: A framework, based on possible-worlds semantics, that computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data.
Abstract: This paper targets the problem of computing meaningful clusterings from uncertain data sets Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, ie, the clustering of the actual (but unknown) data Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty

35 citations


Journal ArticleDOI
01 Jan 2014
TL;DR: This work proposes an indexing technique, paired with an on-line reverse top-k search algorithm, that is efficient and has manageable storage requirements even when applied on very large graphs.
Abstract: With the increasing popularity of social networks, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Node-to-node proximity is the key building block for many graph-based applications that search or analyze the data. Among various proximity measures, random walk with restart (RWR) is widely adopted because of its ability to consider the global structure of the whole network. Although RWR-based similarity search has been well studied before, there is no prior work on reverse top-k proximity search in graphs based on RWR. We discuss the applicability of this query and show that its direct evaluation using existing methods on RWR-based similarity search has very high computational and storage demands. To address this issue, we propose an indexing technique, paired with an on-line reverse top-k search algorithm. Our experiments show that our technique is efficient and has manageable storage requirements even when applied on very large graphs.

35 citations


Journal ArticleDOI
01 Aug 2014
TL;DR: This paper proposes an extension for RDF stores that supports efficient spatial data management, including an effective encoding scheme for entities having spatial locations, the introduction of on-the-fly spatial filters and spatial join algorithms, and several optimizations that minimize the overhead of geometry and dictionary accesses.
Abstract: The RDF data model has recently been extended to support representation and querying of spatial information (i.e., locations and geometries), which is associated with RDF entities. Still, there are limited efforts towards extending RDF stores to efficiently support spatial queries, such as range selections (e.g., find entities within a given range) and spatial joins (e.g., find pairs of entities whose locations are close to each other). In this paper, we propose an extension for RDF stores that supports efficient spatial data management. Our contributions include an effective encoding scheme for entities having spatial locations, the introduction of on-the-fly spatial filters and spatial join algorithms, and several optimizations that minimize the overhead of geometry and dictionary accesses. We implemented the proposed techniques as an extension to the opensource RDF-3X engine and we experimentally evaluated them using real RDF knowledge bases. The results show that our system offers robust performance for spatial queries, while introducing little overhead to the original query engine.

24 citations


Journal ArticleDOI
Hao Wang1, Yilun Cai1, Yin Yang, Shiming Zhang, Nikos Mamoulis1 
TL;DR: In this paper, the problem of finding objects with durable quality over time in historical time series databases was studied, and efficient and scalable algorithms for the DTop-k and DkNN queries were proposed.
Abstract: This paper studies the problem of finding objects with durable quality over time in historical time series databases. For example, a sociologist may be interested in the top 10 web search terms during the period of some historical events; the police may seek for vehicles that move close to a suspect 70 percent of the time during a certain time period and so on. Durable top-k (DTop-k) and nearest neighbor (DkNN) queries can be viewed as natural extensions of the standard snapshot top-k and NN queries to timestamped sequences of values or locations. Although their snapshot counterparts have been studied extensively, to our knowledge, there is little prior work that addresses this new class of durable queries. Existing methods for DTop-k processing either apply trivial solutions, or rely on domain-specific properties. Motivated by this, we propose efficient and scalable algorithms for the DTop-k and DkNN queries, based on novel indexing and query evaluation techniques. Our experiments show that the proposed algorithms outperform previous and baseline solutions by a wide margin.

24 citations


Proceedings ArticleDOI
19 May 2014
TL;DR: This tutorial provides a comprehensive overview of the different challenges involved in managing uncertain spatial and spatio-temporal data and presents state-of-the-art techniques for addressing them.
Abstract: Location-related data has a tremendous impact in many applications of high societal relevance and its growing volume from heterogeneous sources is one true example of a Big Data [1]. An inherent property of any spatio-temporal dataset is uncertainty due to various sources of imprecision. This tutorial provides a comprehensive overview of the different challenges involved in managing uncertain spatial and spatio-temporal data and presents state-of-the-art techniques for addressing them.

21 citations


Proceedings ArticleDOI
03 Jul 2014
TL;DR: This paper proposes two methods to improve the performance of the state-of-art social network-based recommender system (SNRS), which is based on a probabilistic model, and classifies the correlations between pairs of users' ratings.
Abstract: With the rapid expansion of online social networks, social network-based recommendation has become a meaningful and effective way of suggesting new items or activities to users. In this paper, we propose two methods to improve the performance of the state-of-art social network-based recommender system (SNRS), which is based on a probabilistic model. Our first method classifies the correlations between pairs of users' ratings. The other is making the system robust to sparse data, i.e., few immediate friends having few common ratings with the target user. Our experimental study demonstrates that our techniques significantly improve the accuracy of SNRS.

20 citations


Book ChapterDOI
21 Apr 2014
TL;DR: This paper proposes an efficient solution to the problem of geo-social skyline queries by showing how the RWR-distance can be bounded efficiently and effectively in order to identify true hits and true drops early, and shows that the presented pruning techniques allow to vastly reduce the number of objects for which a more exact social distance has to be computed.
Abstract: By leveraging the capabilities of modern GPS-equipped mobile devices providing social-networking services, the interest in developing advanced services that combine location-based services with social networking services is growing drastically. Based on geo-social networks that couple personal location information with personal social context information, such services are facilitated by geo-social queries that extract useful information combining social relationships and current locations of the users. In this paper, we tackle the problem of geo-social skyline queries, a problem that has not been addressed so far. Given a set of persons D connected in a social network SN with information about their current location, a geo-social skyline query reports for a given user U e D and a given location P (not necessarily the location of the user) the pareto-optimal set of persons who are close to P and closely connected to U in SN. We measure the social connectivity between users using the widely adoted, but very expensive Random Walk with Restart method (RWR) to obtain the social distance between users in the social network. We propose an efficient solution by showing how the RWR-distance can be bounded efficiently and effectively in order to identify true hits and true drops early. Our experimental evaluation shows that our presented pruning techniques allow to vastly reduce the number of objects for which a more exact social distance has to be computed, by using our proposed bounds only.

18 citations


Book ChapterDOI
21 Apr 2014
TL;DR: This work proposes two types of RNN queries based on a well established model for uncertain spatial temporal data based on stochastic processes, namely the Markov model, and is the first to consider RNN query on uncertain trajectory databases in accordance with the possible worlds semantics.
Abstract: Reverse nearest neighbor (RNN) queries in spatial and spatio-temporal databases have received significant attention in the database research community over the last decade A reverse nearest neighbor (RNN) query finds the objects having a given query object as its nearest neighbor RNN queries find applications in data mining, marketing analysis, and decision making Most previous research on RNN queries over trajectory databases assume that the data are certain In realistic scenarios, however, trajectories are inherently uncertain due to measurement errors or time-discretized sampling In this paper, we study RNN queries in databases of uncertain trajectories We propose two types of RNN queries based on a well established model for uncertain spatial temporal data based on stochastic processes, namely the Markov model To the best of our knowledge our work is the first to consider RNN queries on uncertain trajectory databases in accordance with the possible worlds semantics We include an extensive experimental evaluation on both real and synthetic data sets to verify our theoretical results

17 citations


Journal ArticleDOI
TL;DR: This paper investigates the effective and efficient retrieval of concise and informative OS snippets (denoted as size-l OSs) with an optimal dynamic programming algorithm, two greedy algorithms and preprocessing heuristics, and proposes and investigates the effectiveness of two types of size- l OSs, namely size- L OS (t)s and size-L OS (a)s that consist of l tuple nodes and l attribute nodes respectively.
Abstract: The Object Summary (OS)is a recently proposed tree structure, which summarizes all data held in a relational database about a data subject. An OS can potentially be very large in size and therefore unfriendly for users who wish to view synoptic information about the data subject. In this paper, we investigate the effective and efficient retrieval of concise and informative OS snippets (denoted as size-l OSs). We propose and investigate the effectiveness of two types of size- l OSs, namely size- l OS (t)s and size-l OS (a)s that consist of l tuple nodes and l attribute nodes respectively. For computing size-l OSs, we propose an optimal dynamic programming algorithm, two greedy algorithms and preprocessing heuristics. By collecting feedback from real users (e.g., from DBLP authors), we assess the relative usability of the two different types of snippets, the choice of the size- l parameter, as well as the effectiveness of the snippets with respect to the user expectations. In addition, via thorough evaluation on real databases, we test the speed and effectiveness of our techniques.

16 citations


Proceedings ArticleDOI
03 Jul 2014
TL;DR: A collective topic model based on Probabilistic latent semantic analysis (PLSA), authorship, published venues and citation relations are used for quantifying paper importance and experiments indicate that this model is superior in milestone paper discovery when compared to a previous model which considers only papers.
Abstract: Prior arts stay at the foundation for future work in academic research. However the increasingly large amount of publications makes it difficult for researchers to effectively discover the most important previous works to the topic of their research. In this paper, we study the automatic discovery of the core papers for a research area. We propose a collective topic model on three types of objects: papers, authors and published venues. We model any of these objects as bags of citations. Based on Probabilistic latent semantic analysis (PLSA), authorship, published venues and citation relations are used for quantifying paper importance. Our method discusses milestone paper discovery in different cases of input objects. Experiments on the ACL Anthology Network (ANN) indicate that our model is superior in milestone paper discovery when compared to a previous model which considers only papers.