scispace - formally typeset
Search or ask a question

Showing papers by "Nikos Mamoulis published in 2017"


Posted Content
TL;DR: A network embedding is a representation of a large graph in a low-dimensional space, where vertices are modeled as vectors to preserve the proximity between vertices in the original graph.
Abstract: A network embedding is a representation of a large graph in a low-dimensional space, where vertices are modeled as vectors. The objective of a good embedding is to preserve the proximity between vertices in the original graph. This way, typical search and mining methods can be applied in the embedded space with the help of off-the-shelf multidimensional indexing approaches. Existing network embedding techniques focus on homogeneous networks, where all vertices are considered to belong to a single class.

102 citations


Proceedings ArticleDOI
03 Apr 2017
TL;DR: A novel aspect of package-to-group recommendations, that of fairness, is focused on and two definitions of fairness are explored, showing that for either definition the problem of finding the most fair package is NP-hard.
Abstract: Recommending packages of items to groups of users has several applications, including recommending vacation packages to groups of tourists, entertainment packages to groups of friends, or sets of courses to groups of students. In this paper, we focus on a novel aspect of package-to-group recommendations, that of fairness. Specifically, when we recommend a package to a group of people, we ask that this recommendation is fair in the sense that every group member is satisfied by a sufficient number of items in the package. We explore two definitions of fairness and show that for either definition the problem of finding the most fair package is NP-hard. We exploit the fact that our problem can be modeled as a coverage problem, and we propose greedy algorithms that find approximate solutions within reasonable time. In addition, we study two extensions of the problem, where we impose category or spatial constraints on the items to be included in the recommended packages. We evaluate the appropriateness of the fairness models and the performance of the proposed algorithms using real data from Yelp, and a user study.

90 citations


Proceedings ArticleDOI
09 May 2017
TL;DR: A Fast and EXact Inner PROduct retrieval (FEXIPRO) framework, based on sequential scan, which includes an integer approximation version of P, which can be used to compute fast upper bounds for the inner products that can prune item vectors.
Abstract: Recommender systems have many successful applications in e-commerce and social media, including Amazon, Netflix, and Yelp. Matrix Factorization (MF) is one of the most popular recommendation approaches; the original user-product rating matrix R with millions of rows and columns is decomposed into a user matrix Q and an item matrix P, such that the product QT P approximates R. Each column q (p) of Q (P) holds the latent factors of the corresponding user (item), and qT p is a prediction of the rating to item p by user q. Recommender systems based on MF suggest to a user in q the items with the top-k scores in qT P. For this problem, we propose a Fast and EXact Inner PROduct retrieval (FEXIPRO) framework, based on sequential scan, which includes three elements. First, FEXIPRO applies an SVD transformation to P, after which the first several dimensions capture a large percentage of the inner products. This enables us to prune item vectors by only computing their partial inner products with q. Second, we construct an integer approximation version of P, which can be used to compute fast upper bounds for the inner products that can prune item vectors. Finally, we apply a lossless transformation to P, such that the resulting matrix has only positive values, allowing for the inner products to be monotonically increasing with dimensionality. Experiments on real data demonstrate that our framework outperforms alternative approaches typically by an order of magnitude.

72 citations


Journal ArticleDOI
TL;DR: Four intuitive techniques, based on combinations of locations suppression and trajectories, are devised and it is shown that they can prevent privacy breaches while keeping published data accurate for aggregate query answering and frequent subsets data mining.
Abstract: We study the problem of preserving user privacy in the publication of location sequences. Consider a database of trajectories, corresponding to movements of people, captured by their transactions when they use credit cards, RFID debit cards, or NFC ( http://en.wikipedia.org/wiki/Near_field_communication ) compliant devices. We show that, if such trajectories are published exactly (by only hiding the identities of persons that followed them), one can use partial trajectory knowledge as a quasi-identifier for the remaining locations in the sequence. We devise four intuitive techniques, based on combinations of locations suppression and trajectories splitting, and we show that they can prevent privacy breaches while keeping published data accurate for aggregate query answering and frequent subsets data mining.

66 citations


Proceedings ArticleDOI
06 Nov 2017
TL;DR: This work model the fault-tolerant subspace clustering problem as a search problem on graphs and presents an algorithm, GraphRec, based on the concept of α-ß-core, which is extremely fast compared to the state-of-the-art.
Abstract: Fault-tolerant group recommendation systems based on subspace clustering successfully alleviate high-dimensionality and sparsity problems. However, the cost of recommendation grows exponentially with the size of dataset. To address this issue, we model the fault-tolerant subspace clustering problem as a search problem on graphs and present an algorithm, GraphRec, based on the concept of α-s-core. Moreover, we propose two variants of our approach that use indexes to improve query latency. Our experiments on different datasets demonstrate that our methods are extremely fast compared to the state-of-the-art.

50 citations


Proceedings ArticleDOI
01 Apr 2017
TL;DR: This paper defines the Extended Characteristic Set (ECS), a schema abstraction that classifies triples based on the properties of their subjects and objects, and implements axonDB, an RDF storage and querying engine based on ECS indexing.
Abstract: SPARQL query execution in state of the art RDF engines depends on, and is often limited by the underlying storage and indexing schemes Typically, these systems exhaustively store permutations of the standard three-column triples table However, even though RDF can give birth to datasets with loosely defined schemas, it is common for an emerging structure to appear in the data In this paper, we introduce a novel indexing scheme for RDF data, that takes advantage of the inherent structure of triples To this end, we define the Extended Characteristic Set (ECS), a schema abstraction that classifies triples based on the properties of their subjects and objects, and we discuss methods and algorithms for the identification and extraction of ECSs We show how these can be used to assist query processing, and we implement axonDB, an RDF storage and querying engine based on ECS indexing We perform an experimental evaluation on real world and synthetic datasets and observe that axonDB outperforms the competition by a few orders of magnitude

37 citations


Journal ArticleDOI
01 Aug 2017
TL;DR: This paper proposes two optimizations of FS that greatly reduce its cost, making it competitive to the state-of-the-art single-threaded PS algorithm while achieving a lower memory footprint and demonstrates the efficiency and scalability of the parallelization framework.
Abstract: The interval join is a basic operation that finds application in temporal, spatial, and uncertain databases. Although a number of centralized and distributed algorithms have been proposed for the efficient evaluation of interval joins, classic plane sweep approaches have not been considered at their full potential. A recent piece of related work proposes an optimized approach based on plane sweep (PS) for modern hardware, showing that it greatly outperforms previous work. However, this approach depends on the development of a complex data structure and its parallelization has not been adequately studied. In this paper, we explore the applicability of a largely ignored forward scan (FS) based plane sweep algorithm, which is extremely simple to implement. We propose two optimizations of FS that greatly reduce its cost, making it competitive to the state-of-the-art single-threaded PS algorithm while achieving a lower memory footprint. In addition, we show the drawbacks of a previously proposed hash-based partitioning approach for parallel join processing and suggest a domain-based partitioning approach that does not produce duplicate results. Within our approach we propose a novel breakdown of the partition join jobs into a small number of independent mini-join jobs with varying cost and manage to avoid redundant comparisons. Finally, we show how these mini-joins can be scheduled in multiple CPU cores and propose an adaptive domain partitioning, aiming at load balancing. We include an experimental study that demonstrates the efficiency of our optimized FS and the scalability of our parallelization framework.

33 citations


Book ChapterDOI
01 Jan 2017
TL;DR: Experimental results on two real datasets show that the proposed Hierarchical Bayesian Model (HBGG) methods outperforms the state-of-the-art group recommenders, especially on cold-start user groups.
Abstract: Location-based social networks such as Foursquare and Plancast have gained increasing popularity. On those sites, users can organize and participate in group activities; hence, recommending venues to a group is of practical importance. In this paper, we study the problem of recommending venues to groups of users and propose a Hierarchical Bayesian Model (HBGG) for this purpose. First, a generative group geographical topic model (GG) which exploits group membership, group mobility regions and group preferences is proposed. And we integrate social structure into oneclass collaborative filtering as social-based collaborative filtering (SOCF) to leverage social wisdom. Through the shared latent group features, HBGG connects the group geographical model with SOCF framework for group recommendation. Experimental results on two real datasets show that our methods outperforms the state-of-the-art group recommenders, especially on cold-start user groups.

30 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper designed a location recommendation framework that combines results from various recommenders that consider different factors, and estimated the underlying influence of each factor to each individual user.
Abstract: Location recommendation is an important feature of social network applications and location-based services. Most existing studies focus on developing one single method or model for all users. By analyzing data from two real location-based social networks (Foursquare and Gowalla), in this paper we reveal that the decisions of users on place visits depend on multiple factors, and different users may be affected differently by these factors. We design a location recommendation framework that combines results from various recommenders that consider different factors. Our framework estimates, for each individual user, the underlying influence of each factor to her. Based on the estimation, we aggregate suggestions from different recommenders to derive personalized recommendations. Experiments on Foursquare and Gowalla show that our proposed method outperforms the state-of-the-art methods on location recommendation.

26 citations


Journal ArticleDOI
TL;DR: This article proposes a partner-aware activity recommendation model, which integrates this hypothesis into conventional recommendation approaches, and builds upon the users’ historical attendance preferences, their social context, and geographic information to help improve the effectiveness of recommending activities to users.
Abstract: Recommending social activities, such as watching movies or having dinner, is a common function found in social networks or e-commerce sites. Besides certain websites which manage activity-related locations (e.g., foursquare.com), many items on product sale platforms (e.g., groupon.com) can naturally be mapped to social activities. For example, movie tickets can be thought of as activity items, which can be mapped as a social activity of “watch a movie.” Traditional recommender systems estimate the degree of interest for a target user on candidate items (or activities), and accordingly, recommend the top-k activity items to the user. However, these systems ignore an important social characteristic of recommended activities: people usually tend to participate in those activities with friends. This article considers this fact for improving the effectiveness of recommendation in two directions. First, we study the problem of activity-partner recommendation; i.e., for each recommended activity item, find a suitable partner for the user. This (i) saves the user’s time for finding activity partners, (ii) increases the likelihood that the activity item will be selected by the user, and (iii) improves the effectiveness of recommender systems to users overall and enkindles their social enthusiasm. Our partner recommender is built upon the users’ historical attendance preferences, their social context, and geographic information. Moreover, we explore how to leverage the partner recommendation to help improve the effectiveness of recommending activities to users. Assuming that users tend to select the activities for which they can find suitable partners, we propose a partner-aware activity recommendation model, which integrates this hypothesis into conventional recommendation approaches. Finally, the recommended items not only match users’ interests, but also have high chances to be selected by the users, because the users can find suitable partners to attend the corresponding activities together. We conduct experiments on real data to evaluate the effectiveness of activity-partner recommendation and partner-aware activity recommendation. The results verify that (i) suggesting partners greatly improves the likelihood that a recommended activity item is to be selected by the target user and (ii) considering the existence of suitable partners in the ranking of recommended items improves the accuracy of recommendation significantly.

10 citations


Journal ArticleDOI
01 Oct 2017
TL;DR: This paper argues that the effective thematic ranking of OSs should combine gracefully IR-style properties, authoritative ranking and affinity, and proposes an algorithm that computes the join efficiently, taking advantage of appropriate count statistics and compare it with baseline approaches.
Abstract: An Object Summary (OS) is a tree structure of tuples that summarizes the context of a particular Data Subject (DS) tuple. The OS has been used as a model of keyword search in relational databases; where given a set of keywords, the objective is to identify the DSs tuples relevant to the keywords and their corresponding OSs. However, a query result may return a large amount of OSs, which brings in the issue of effectively and efficiently ranking them in order to present only the most important ones to the user. In this paper, we propose a model that ranks OSs containing a set of identifying keywords (e.g., Chen ) according to their relevance to a set of thematic keywords (e.g. Mining ). We argue that the effective thematic ranking of OSs should combine gracefully IR-style properties, authoritative ranking and affinity. Our ranking problem is modeled and solved as a top-k group-by join; we propose an algorithm that computes the join efficiently, taking advantage of appropriate count statistics and compare it with baseline approaches. An experimental evaluation on the DBLP and TPC-H databases verifies the effectiveness and efficiency of our proposal.

Book ChapterDOI
21 Aug 2017
TL;DR: This paper proposes an effective spatial proximity measure between a query issuer and a query with a location distribution obtained from its clicked URLs in the query history, and extends two popular query recommendation approaches to the location-aware setting, which provides recommendations that are semantically relevant to the original query and their results are spatially close to the query issuer.
Abstract: Query recommendation is a popular add-on feature of search engines, which provides related and helpful reformulations of a keyword query. Due to the dropping prices of smartphones and the increasing coverage and bandwidth of mobile networks, a large percentage of search engine queries are issued from mobile devices. This makes it possible to provide better query recommendations by considering the physical locations of the query issuers. However, limited research has been done on location-aware query recommendation for search engines. In this paper, we propose an effective spatial proximity measure between a query issuer and a query with a location distribution obtained from its clicked URLs in the query history. Based on this, we extend two popular query recommendation approaches to our location-aware setting, which provides recommendations that are semantically relevant to the original query and their results are spatially close to the query issuer. In addition, we extend the bookmark coloring algorithm for graph proximity search to support our proposed approaches online, with a spatial partitioning based approximation that accelerates the computation of our proposed spatial proximity. We conduct experiments using a real query log, which show that our query recommendation approaches significantly outperform previous work in terms of quality, and they can be efficiently applied online.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper studies the novel problem of parallel and distributed processing of spatial preference queries using keywords, where the input data is stored in a distributed way and proposes parallel algorithms that solve the problem in the MapReduce framework.
Abstract: Advanced queries that combine spatial constraints with textual relevance to retrieve objects of interest have attracted increased attention recently due to the ever-increasing rate of user-generated spatio-textual data. Motivated by this trend, in this paper, we study the novel problem of parallel and distributed processing of spatial preference queries using keywords, where the input data is stored in a distributed way. Given a set of keywords, a set of spatial data objects and a set of spatial feature objects that are additionally annotated with textual descriptions, the spatial preference query using keywords retrieves the top-k spatial data objects ranked according to the textual relevance of feature objects in their vicinity. This query type is processing-intensive, especially for large datasets, since any data objects may belong to the result set while the spatial range defines the score, and the k data objects with the highest score need to be retrieved. Our solution has two notable features: (a) we propose a deliberate re-partitioning mechanism of input data to servers, which allows parallelized processing, thus establishing the foundations for a scalable query processing algorithm, and (b) we boost the query processing performance in each partition by introducing an early termination mechanism that delivers the correct result by only examining few data objects. Capitalizing on this, we implement parallel algorithms that solve the problem in the MapReduce framework. Our experimental study using both real and synthetic data in a cluster of sixteen physical machines demonstrates the efficiency of our solution.

Proceedings ArticleDOI
Yuqiu Qian1, Hui Li1, Nikos Mamoulis1, Yu Liu1, David W. Cheung1 
01 Jan 2017
TL;DR: A filterand-refinement framework, which prunes the search space while traversing the graph in search for the reverse k-ranks query results, and an optimized algorithm and an index that apply on this framework and boost its performance.
Abstract: Given a collection of objects, the reverse k-ranks query takes as input a query object q in the set and returns the top-k objects that rank q higher compared to where other objects rank q. This query has been studied in the vector space, however, there is no previous work in the context of graphs. In this paper, we propose a filterand-refinement framework, which prunes the search space while traversing the graph in search for the reverse k-ranks query results. We present an optimized algorithm and an index that apply on this framework and boost its performance. The proposed techniques are evaluated on real data; the experimental results show that our solutions scale well, rendering the query applicable for searching large graphs.

Book ChapterDOI
21 Aug 2017
TL;DR: An extension of the R-tree index, called TAR-tree, is proposed that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation.
Abstract: Consider a group of users who would like to meet to a place in order to participate in an activity together (e.g., meet at a restaurant to dine). Such meeting point queries have been studied in the context of spatial databases, where typically the suggested points are the ones that minimize an aggregate traveling distance. Recently, meeting point queries have been enriched to take as input, besides the locations of users, also some preference criteria (e.g., expressed by some keywords). However, in many applications, a group of users may require a meeting point recommendation without explicitly specifying any preferences. Motivated by this, we study this scenario of group recommendation for such passive users. We use topic modeling to infer the preferences of the group on the different points of interest and combine these preferences with the aggregate spatial distance of the group members to the candidate points for recommendation in a unified search model. Then, we propose an extension of the R-tree index, called TAR-tree, that indexes the topic vectors of the places together with their spatial locations, in order to facilitate efficient group recommendation. We propose and compare three variants of the TAR-tree and a compression technique for the index, that improves its performance. The proposed techniques are evaluated on real data; the results demonstrate the efficiency and effectiveness of our methods.

Journal ArticleDOI
TL;DR: The objective of the work is to consider the personal preferences of users in review recommendation, by selecting a personalized top reviews set (PTRS), which includes reviews of which the content is related to the aspects important to the user.

Journal ArticleDOI
TL;DR: This paper advances the state-of-the-art by combining existing approaches to a hybrid nearest neighbor-based method while also proposing an alternative, more efficient spatial range-based approach and investigating the continuous counterpart of distance-to-points trajectory search.
Abstract: Trajectory data capture the traveling history of moving objects such as people or vehicles. With the proliferation of GPS and tracking technologies, huge volumes of trajectories are rapidly generated and collected. Under this, applications such as route recommendation and traveling behavior mining call for efficient trajectory retrieval. In this paper, we first focus on distance-to-points trajectory search; given a collection of trajectories and a set query points, the goal is to retrieve the top-k trajectories that pass as close as possible to all query points. We advance the state-of-the-art by combining existing approaches to a hybrid nearest neighbor-based method while also proposing an alternative, more efficient spatial range-based approach. Second, we investigate the continuous counterpart of distance-to-points trajectory search where the query is long-standing and the set of returned trajectories needs to be maintained whenever updates occur to the query and/or the data. Third, we propose and study two practical variants of distance-to-points trajectory search, which take into account the temporal characteristics of the searched trajectories. Through an extensive experimental analysis with real trajectory data, we show that our range-based approach outperforms previous methods by at least one order of magnitude for the snapshot and up to several times for the continuous version of the queries.

Proceedings Article
01 Jan 2017

Posted Content
TL;DR: T-Crowd as mentioned in this paper integrates each worker's answers on different attributes to effectively learn his/her trustworthiness and the true data values, which is also used to guide task allocation to workers.
Abstract: Crowdsourcing employs human workers to solve computer-hard problems, such as data cleaning, entity resolution, and sentiment analysis. When crowdsourcing tabular data, e.g., the attribute values of an entity set, a worker's answers on the different attributes (e.g., the nationality and age of a celebrity star) are often treated independently. This assumption is not always true and can lead to suboptimal crowdsourcing performance. In this paper, we present the T-Crowd system, which takes into consideration the intricate relationships among tasks, in order to converge faster to their true values. Particularly, T-Crowd integrates each worker's answers on different attributes to effectively learn his/her trustworthiness and the true data values. The attribute relationship information is also used to guide task allocation to workers. Finally, T-Crowd seamlessly supports categorical and continuous attributes, which are the two main datatypes found in typical databases. Our extensive experiments on real and synthetic datasets show that T-Crowd outperforms state-of-the-art methods in terms of truth inference and reducing the cost of crowdsourcing.