scispace - formally typeset
Search or ask a question

Showing papers by "Nikos Mamoulis published in 2018"


Proceedings ArticleDOI
16 Apr 2018
TL;DR: This paper proves that answering SPM queries is computationally intractable, and proposes two efficient algorithms for their evaluation that are highly effective and efficient.
Abstract: In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial pattern P. A pattern P is a graph where vertices represent spatial objects, and edges denote distance relationships between them. The SPM query returns the instances that satisfy P. An example of P can be "a house within 10-minute walk from a school, which is at least 2km away from a hospital". The SPM query can benefit users such as house buyers, urban planners, and archaeologists. We prove that answering such queries is computationally intractable, and propose two efficient algorithms for their evaluation. Extensive experimental evaluation and cases studies on four real datasets show that our proposed solutions are highly effective and efficient.

26 citations


Journal ArticleDOI
TL;DR: This paper shows how the density-based clustering paradigm can be extended to apply on places which are visited by users of a geo-social network, and considers spatio-temporal information and the social relationships between users who visit the clustered places.
Abstract: Spatial clustering deals with the unsupervised grouping of places into clusters and finds important applications in urban planning and marketing. Current spatial clustering models disregard information about the people and the time who and when are related to the clustered places. In this paper, we show how the density-based clustering paradigm can be extended to apply on places which are visited by users of a geo-social network. Our model considers spatio-temporal information and the social relationships between users who visit the clustered places. After formally defining the model and the distance measure it relies on, we provide alternatives to our model and the distance measure. We evaluate the effectiveness of our model via a case study on real data; in addition, we design two quantitative measures, called social entropy and community score, to evaluate the quality of the discovered clusters. The results show that temporal-geo-social clusters have special properties and cannot be found by applying simple spatial clustering approaches and other alternatives.

20 citations


Journal ArticleDOI
TL;DR: This article examines two information sources: a knowledge base (or KB), such as YAGO and Freebase; and a click log, which contains the URLs accessed by a query user, and studies how to use these sources to find new entities useful for query recommendation.
Abstract: Query recommendation, which suggests related queries to search engine users, has attracted a lot of attention in recent years. Most of the existing solutions, which perform analysis of users’ search history (or query logs), are often insufficient for long-tail queries that rarely appear in query logs. To handle such queries, we study the use of entities found in queries to provide recommendations. Specifically, we extract entities from a query, and use these entities to explore new ones by consulting an information source. The discovered entities are then used to suggest new queries to the user. In this article, we examine two information sources: (1) a knowledge base (or KB), such as YAGO and Freebase; and (2) a click log, which contains the URLs accessed by a query user. We study how to use these sources to find new entities useful for query recommendation. We further study a hybrid framework that integrates different query recommendation methods effectively. As shown in the experiments, our proposed approaches provide better recommendations than existing solutions for long-tail queries. In addition, our query recommendation process takes less than 100ms to complete. Thus, our solution is suitable for providing online query recommendation services for search engines.

20 citations


Journal ArticleDOI
TL;DR: Experimental results on real datasets demonstrate the effectiveness of the work in recommending high-quality investment opinions and profitable portfolios.

17 citations


Proceedings ArticleDOI
16 Apr 2018
TL;DR: T-Crowd is presented: a crowdsourcing system that considers attribute relationships that seamlessly supports categorical and continuous attributes and outperforms state-of-the-art methods, improving the quality of truth inference.
Abstract: We study the effective use of crowdsourcing in filling missing values in a given relation (e.g., a table containing different attributes of celebrity stars, such as nationality and age). A task given to a worker typically consists of questions about the missing attribute values (e.g., what is the age of Jet Li?). Existing work often treats related attributes independently, leading to suboptimal performance. We present T-Crowd: a crowdsourcing system that considers attribute relationships. T-Crowd integrates each worker's answers on different attributes to effectively learn his/her trustworthiness and the true data values. Our solution seamlessly supports categorical and continuous attributes. Our experiments on real datasets show that T-Crowd outperforms state-of-the-art methods, improving the quality of truth inference.

14 citations


Posted Content
TL;DR: This paper introduces network flow motifs, a novel type of motifs that model significant flow transfer among a set of vertices within a constrained time window and designs an algorithm for identifying flow motif instances in a large graph.
Abstract: Many real-world phenomena are best represented as interaction networks with dynamic structures (e.g., transaction networks, social networks, traffic networks). Interaction networks capture flow of data which is transferred between their vertices along a timeline. Analyzing such networks is crucial toward comprehend- ing processes in them. A typical analysis task is the finding of motifs, which are small subgraph patterns that repeat themselves in the network. In this paper, we introduce network flow motifs, a novel type of motifs that model significant flow transfer among a set of vertices within a constrained time window. We design an algorithm for identifying flow motif instances in a large graph. Our algorithm can be easily adapted to find the top-k instances of maximal flow. In addition, we design a dynamic programming module that finds the instance with the maximum flow. We evaluate the performance of the algorithm on three real datasets and identify flow motifs which are significant for these graphs. Our results show that our algorithm is scalable and that the real networks indeed include interesting motifs, which appear much more frequently than in randomly generated networks having similar characteristics.

12 citations


Proceedings ArticleDOI
16 Apr 2018
TL;DR: This paper proposes SpaceKey, a system for retrieving and visualizing spatial objects returned by SGK queries, and supports a novel query, called SPM query, which is defined by a spatial pattern, a graph whose vertices contain keywords and its edges are associated with distance constraints.
Abstract: Spatial objects associated with keywords are prevalent in applications such as Google Maps and Twitter. Recently, the topic of spatial keyword queries has received plenty of attention. Spatial Group Keyword (SGK) search is a popular class of queries; their goal is to find a set of objects which are close to each other and are associated to a set of input keywords. In this paper, we propose SpaceKey, a system for retrieving and visualizing spatial objects returned by SGK queries. In addition to existing SGK query types, SpaceKey supports a novel query, called SPM query. An SPM query is defined by a spatial pattern, a graph whose vertices contain keywords and its edges are associated with distance constraints. The results are sets of objects that match the pattern. SpaceKey allows users to perform comparison analysis between different SGK query types. We plan to make SpaceKey an open-source web-based platform, and design API functions for software developers to plug other SGK query algorithms into our system.

8 citations


Proceedings Article
01 Jan 2018
TL;DR: The state-of-the-art algorithm for interval joins is extended to evaluate ICS J at the cost of only scanning the sorted interval endpoints, enabling an efficient evaluation of an interval count semi-join operation.
Abstract: Interval joins find applications in several domains, including temporal and spatial databases, uncertain data management, streaming data processing. In this paper, we study the evaluation of an interval count semi-join (ICS J ) operation that can be used for selecting or ranking intervals based on the number of join pairs they appear in. We extend the state-of-the-art algorithm for interval joins to evaluate ICS J at the cost of only scanning the sorted interval endpoints.

7 citations


Journal ArticleDOI
TL;DR: This paper forms the P2G problem, and it proposes probabilistic models that capture the preference of a group toward a package, incorporating factors such as user impact and package viability, and investigates the issue of recommendation fairness.
Abstract: The success of recommender systems has made them the focus of a massive research effort in both industry and academia. Recent work has generalized recommendations to suggest packages of items to single users, or single items to groups of users. However, to the best of our knowledge, the interesting problem of recommending a package to a group of users (P2G) has not been studied to date. This is a problem with several practical applications, such as recommending vacation packages to tourist groups, entertainment packages to groups of friends or sets of courses to groups of students. In this paper, we formulate the P2G problem, and we propose probabilistic models that capture the preference of a group toward a package, incorporating factors such as user impact and package viability. We also investigate the issue of recommendation fairness. This is a novel consideration that arises in our setting, where we require that no user is consistently slighted by the item selection in the package. In addition, we study a special case of the P2G problem, where the recommended items are places and the recommendation should consider the current locations of the users in the group. We present aggregation algorithms for finding the best packages and compare our suggested models with baseline approaches stemming from previous work. The results show that our models find packages of high quality which consider all special requirements of P2G recommendation.

7 citations


Proceedings Article
01 Jan 2018
TL;DR: In this paper, the authors introduce network flow motifs, a novel type of motifs that model significant flow transfer among a set of vertices within a constrained time window, and design an algorithm for identifying flow motif instances in a large graph.
Abstract: Many real-world phenomena are best represented as interaction networks with dynamic structures (e.g., transaction networks, social networks, traffic networks). Interaction networks capture flow of data which is transferred between their vertices along a timeline. Analyzing such networks is crucial toward comprehend- ing processes in them. A typical analysis task is the finding of motifs, which are small subgraph patterns that repeat themselves in the network. In this paper, we introduce network flow motifs, a novel type of motifs that model significant flow transfer among a set of vertices within a constrained time window. We design an algorithm for identifying flow motif instances in a large graph. Our algorithm can be easily adapted to find the top-k instances of maximal flow. In addition, we design a dynamic programming module that finds the instance with the maximum flow. We evaluate the performance of the algorithm on three real datasets and identify flow motifs which are significant for these graphs. Our results show that our algorithm is scalable and that the real networks indeed include interesting motifs, which appear much more frequently than in randomly generated networks having similar characteristics.

6 citations


Proceedings ArticleDOI
02 Feb 2018
TL;DR: This paper proposes a distributed sketched alternating nonnegative least squares (DSANLS) framework for NMF, which utilizes a matrix sketching technique to reduce the size of non negative least squares subproblems in each iteration for U and V.
Abstract: Nonnegative matrix factorization (NMF) has been successfully applied in different fields, such as text mining, image processing, and video analysis. NMF is the problem of determining two nonnegative low rank matrices U and V, for a given input matrix M, such that m ≈ UV⊥. There is an increasing interest in parallel and distributed NMF algorithms, due to the high cost of centralized NMF on large matrices. In this paper, we propose a distributed sketched alternating nonnegative least squares(DSANLS) framework for NMF, which utilizes a matrix sketching technique to reduce the size of nonnegative least squares subproblems in each iteration for U and V. We design and analyze two different random matrix generation techniques and two subproblem solvers. Our theoretical analysis shows that DSANLS converges to the stationary point of the original NMF problem and it greatly reduces the computational cost in each subproblem as well as the communication cost within the cluster. DSANLS is implemented using MPI for communication, and tested on both dense and sparse real datasets. The results demonstrate the efficiency and scalability of our framework, compared to the state-of-art distributed NMF MPI implementation.

Journal ArticleDOI
TL;DR: This paper proposes an effective spatial proximity measure between a query issuer and a query with a location distribution obtained from its clicked URLs in the query history, and extends popular query recommendation and auto-completion approaches to the authors' location-aware setting, which suggest query reformulations that are semantically relevant to the original query and give results that are spatially close to the query issuer.
Abstract: Query reformulation, including query recommendation and query auto-completion, is a popular add-on feature of search engines, which provide related and helpful reformulations of a keyword query. Due to the dropping prices of smartphones and the increasing coverage and bandwidth of mobile networks, a large percentage of search engine queries are issued from mobile devices. This makes it possible to improve the quality of query recommendation and auto-completion by considering the physical locations of the query issuers. However, limited research has been done on location-aware query reformulation for search engines. In this paper, we propose an effective spatial proximity measure between a query issuer and a query with a location distribution obtained from its clicked URLs in the query history. Based on this, we extend popular query recommendation and auto-completion approaches to our location-aware setting, which suggest query reformulations that are semantically relevant to the original query and give results that are spatially close to the query issuer. In addition, we extend the bookmark coloring algorithm for graph proximity search to support our proposed query recommendation approaches online, and we adapt an A* search algorithm to support our query auto-completion approach. We also propose a spatial partitioning based approximation that accelerates the computation of our proposed spatial proximity. We conduct experiments using a real query log, which show that our proposed approaches significantly outperform previous work in terms of quality, and they can be efficiently applied online.

Patent
29 Nov 2018
TL;DR: In this paper, a KSP algorithm-based resource description framework query method, configured to employ the kSP algorithm to search for a semantic position of a query keyword in an RDF graph, is presented.
Abstract: A KSP algorithm-based resource description framework query method, configured to employ the KSP algorithm to search for a semantic position of a query keyword in an RDF graph. The query method is user-friendly because a user does not need to master a specialized query language and simply needs to input a query keyword. The query method returns a subtree containing all inputted query keywords and near a query position.