scispace - formally typeset
Search or ask a question

Showing papers by "Christian S. Jensen published in 2020"


Journal ArticleDOI
01 Jul 2020
TL;DR: This work considers the problem of learning an index for two-dimensional spatial data and introduces a rank space based ordering technique to establish an ordering of point data and group the points into blocks for index learning, and proposes a recursive strategy that partitions a large point set and learns indices for each partition.
Abstract: Machine learning, especially deep learning, is used increasingly to enable better solutions for data management tasks previously solved by other means, including database indexing. A recent study shows that a neural network can not only learn to predict the disk address of the data value associated with a one-dimensional search key but also outperform B-tree-based indexing, thus promises to speed up a broad range of database queries that rely on B-trees for efficient data access. We consider the problem of learning an index for two-dimensional spatial data. A direct application of a neural network is unattractive because there is no obvious ordering of spatial point data. Instead, we introduce a rank space based ordering technique to establish an ordering of point data and group the points into blocks for index learning. To enable scalability, we propose a recursive strategy that partitions a large point set and learns indices for each partition. Experiments on real and synthetic data sets with more than 100 million points show that our learned indices are highly effective and efficient. Query processing using our indices is more than an order of magnitude faster than the use of R-trees or a recently proposed learned index.

67 citations


Proceedings ArticleDOI
20 Apr 2020
TL;DR: A generic learning framework that employs matrix factorization and graph convolutional neural networks to contend with the data sparseness while capturing spatial correlations and that captures spatio-temporal dynamics via recurrent neural networks extended with graph convolutions is proposed.
Abstract: Origin-destination (OD) matrices are used widely in transportation and logistics to record the travel cost (e.g., travel speed or greenhouse gas emission) between pairs of OD regions during different intervals within a day. We model a travel cost as a distribution because when traveling between a pair of OD regions, different vehicles may travel at different speeds even during the same interval, e.g., due to different driving styles or different waiting times at intersections. This yields stochastic OD matrices. We consider an increasingly pertinent setting where a set of vehicle trips is used for instantiating OD matrices. Since the trips may not cover all OD pairs for each interval, the resulting OD matrices are likely to be sparse. We then address the problem of forecasting complete, near future OD matrices from sparse, historical OD matrices. To solve this problem, we propose a generic learning framework that (i) employs matrix factorization and graph convolutional neural networks to contend with the data sparseness while capturing spatial correlations and that (ii) captures spatio-temporal dynamics via recurrent neural networks extended with graph convolutions. Empirical studies using two taxi trajectory data sets offer detailed insight into the properties of the framework and indicate that it is effective.

64 citations


Journal ArticleDOI
01 Jul 2020
TL;DR: This work presents a practical approach to transforming GPS trajectories into time-varying, uncertain edge weights that guarantee the first-in-first-out property and proposes time-dependent uncertain contraction hierarchies (TUCHs), a generic speed-up technique that supports a wide variety of stochastic route planning functionality in the paper’s setting.
Abstract: Data are increasingly available that enable detailed capture of travel costs associated with the movements of vehicles in road networks, notably travel time, and greenhouse gas emissions. In addition to varying across time, such costs are inherently uncertain, due to varying traffic volumes, weather conditions, different driving styles among drivers, etc. In this setting, we address the problem of enabling fast route planning with time-varying, uncertain edge weights. We initially present a practical approach to transforming GPS trajectories into time-varying, uncertain edge weights that guarantee the first-in-first-out property. Next, we propose time-dependent uncertain contraction hierarchies (TUCHs), a generic speed-up technique that supports a wide variety of stochastic route planning functionality in the paper’s setting. In particular, we propose query processing methods based on TUCH for two representative types of stochastic routing: non-dominated routing and probabilistic budget routing. Experimental studies with a substantial GPS data set offer insight into the design properties of the paper’s proposals and suggest that they are capable of enabling efficient stochastic routing.

43 citations


Journal ArticleDOI
Chenjuan Guo1, Bin Yang1, Jilin Hu1, Christian S. Jensen1, Lu Chen1 
01 Sep 2020
TL;DR: This work provides means of learning contexts and their preferences, and applies these to enhance routing quality while ensuring efficiency, and proposes preference-based contraction hierarchies that are capable of speeding up both off-line learning and on-line routing.
Abstract: Vehicle routing is an important service that is used by both private individuals and commercial enterprises. Drivers may have different contexts that are characterized by different routing preferences. For example, during different times of day or weather conditions, drivers may make different routing decisions such as preferring or avoiding highways. The increasing availability of vehicle trajectory data yields an increasingly rich data foundation for context-aware, preference-based vehicle routing. We aim to improve routing quality by providing new, efficient routing techniques that identify and take contexts and their preferences into account. In particular, we first provide means of learning contexts and their preferences, and we apply these to enhance routing quality while ensuring efficiency. Our solution encompasses an off-line phase that exploits a contextual preference tensor to learn the relationships between contexts and routing preferences. Given a particular context for which trajectories exist, we learn a routing preference. Then, we transfer learned preferences from contexts with trajectories to similar contexts without trajectories. In the on-line phase, given a context, we identify the corresponding routing preference and use it for routing. To achieve efficiency, we propose preference-based contraction hierarchies that are capable of speeding up both off-line learning and on-line routing. Empirical studies with vehicle trajectory data offer insight into the properties of proposed solution, indicating that it is capable of improving quality and is efficient.

32 citations


Journal ArticleDOI
01 May 2020
TL;DR: This work proposes a hybrid approach that combines convolution with estimation based on machine learning to account for dependencies among distributions in order to improve accuracy in probabilistic budget routing.
Abstract: Increasingly massive volumes of vehicle trajectory data hold the potential to enable higher-resolution traffic services than hitherto possible We use trajectory data to create a high-resolution, uncertain road-network graph, where edges are associated with travel-time distributions In this setting, we study probabilistic budget routing that aims to find the path with the highest probability of arriving at a destination within a given time budget A key challenge is to compute accurately and efficiently the travel-time distribution of a path from the travel-time distributions of the edges in the path Existing solutions that rely on convolution assume independence among the distributions to be convolved, but as distributions are often dependent, the result distributions exhibit poor accuracy We propose a hybrid approach that combines convolution with estimation based on machine learning to account for dependencies among distributions in order to improve accuracy Since the hybrid approach cannot rely on the independence assumption that enables effective pruning during routing, naive use of the hybrid approach is costly To address the resulting efficiency challenge, we propose an anytime routing algorithm that is able to return a "good enough" path at any time and that eventually computes a high-quality path Empirical studies involving a substantial real-world trajectory set offer insight into the design properties of the proposed solution, indicating that it is practical in real-world settings

29 citations


Journal ArticleDOI
TL;DR: The Relational Fusion Network (RFN) is introduced, a novel type of Graph Convolutional Network (GCN) designed specifically for road networks and proposed methods that outperform state-of-the-art GCN architectures by up to 21–40% on two machine learning tasks in road networks.
Abstract: The application of machine learning techniques in the setting of road networks holds the potential to facilitate many important intelligent transportation applications. Graph Convolutional Networks (GCNs) are neural networks that are capable of leveraging the structure of a network. However, many implicit assumptions of GCNs do not apply to road networks. We introduce the Relational Fusion Network (RFN), a novel type of Graph Convolutional Network (GCN) designed specifically for road networks. In particular, we propose methods that outperform state-of-the-art GCN architectures by up to 21-40% on two machine learning tasks in road networks. Furthermore, we show that state-of-the-art GCNs may fail to effectively leverage road network structure and may not generalize well to other road networks.

25 citations


Journal ArticleDOI
TL;DR: A so-called "why-not" query is developed that is able to minimally modify the original query into a query that returns the expected, but missing, objects, in addition to other objects.
Abstract: With the proliferation of geo-textual objects on the web, extensive efforts have been devoted to improving the efficiency of top- $k$ k spatial keyword queries in different settings. However, comparatively much less work has been reported on enhancing the quality and usability of such queries. In this context, we propose means of enhancing the usability of a top- $k$ k group spatial keyword query, where a group of users aim to find $k$ k objects that contain given query keywords and are nearest to the users. Specifically, when users receive the result of such a query, they may find that one or more objects that they expect to be in the result are in fact missing, and they may wonder why. To address this situation, we develop a so-called why-not query that is able to minimally modify the original query into a query that returns the expected, but missing, objects, in addition to other objects. Specifically, we formalize the why-not query in relation to the top- $k$ k group spatial keyword query, called the W hy-not G roup S patial K eyword Query ( $\mathsf{WGSK}$ WGSK ) that is able to provide a group of users with a more satisfactory query result. We propose a three-phase framework for efficiently computing the $\mathsf{WGSK}$ WGSK . The first phase substantially reduces the search space for the subsequent phases by retrieving a set of objects that may affect the ranking of the user-expected objects. The second phase provides an incremental sampling algorithm that generates candidate weightings of more promising queries. The third phase determines the penalty of each refined query and returns the query with minimal penalty, i.e., the minimally modified query. Extensive experiments with real and synthetic data offer evidence that the proposed solution excels over baselines with respect to both effectiveness and efficiency.

24 citations


Proceedings ArticleDOI
20 Apr 2020
TL;DR: This work develops four efficient approximation algorithms with guaranteed error bounds in addition to an exact solution that works on relatively small graphs for attribute-constrained co-located community (ACOC) search, and finds that the approximation algorithms are much faster than the exact solution and yet offer high accuracy.
Abstract: Networked data, notably social network data, often comes with a rich set of annotations, or attributes, such as documents (e.g., tweets) and locations (e.g., check-ins). Community search in such attributed networks has been studied intensively due to its many applications in friends recommendation, event organization, advertising, etc. We study the problem of attribute-constrained co-located community (ACOC) search, which returns a community that satisfies three properties: i) structural cohesiveness: the members in the community are densely connected; ii) spatial co-location: the members are close to each other; and iii) attribute constraint: a set of attributes are covered by the attributes associated with the members. The ACOC problem is shown to be NP-hard. We develop four efficient approximation algorithms with guaranteed error bounds in addition to an exact solution that works on relatively small graphs. Extensive experiments conducted with both real and synthetic data offer insight into the efficiency and effectiveness of the proposed methods, showing that they outperform three adapted state-of-the-art algorithms by an order of magnitude. We also find that the approximation algorithms are much faster than the exact solution and yet offer high accuracy.

18 citations


Journal ArticleDOI
TL;DR: In order to solve the influence minimization problem in large, real-world social networks, a robust sampling-based solution with a desirable theoretic bound is proposed and extensive experiments using real social network datasets offer insight into the effectiveness and efficiency of the proposed solutions.
Abstract: An online social network can be used for the diffusion of malicious information like derogatory rumors, disinformation, hate speech, revenge pornography, etc. This motivates the study of influence minimization that aim to prevent the spread of malicious information. Unlike previous influence minimization work, this study considers the influence minimization in relation to a particular group of social network users, called targeted influence minimization. Thus, the objective is to protect a set of users, called target nodes, from malicious information originating from another set of users, called active nodes. This study also addresses two fundamental, but largely ignored, issues in different influence minimization problems: (i) the impact of a budget on the solution; (ii) robust sampling. To this end, two scenarios are investigated, namely unconstrained and constrained budget. Given an unconstrained budget, we provide an optimal solution; Given a constrained budget, we show the problem is NP-hard and develop a greedy algorithm with an $(1-\frac {1}{e})$-approximation. More importantly, in order to solve the influence minimization problem in large, real-world social networks, we propose a robust sampling-based solution with a desirable theoretic bound. Extensive experiments using real social network datasets offer insight into the effectiveness and efficiency of the proposed solutions.

11 citations


Book ChapterDOI
24 Sep 2020
TL;DR: This work proposes a generalization of semantic place retrieval, namely semantic region (SR) retrieval, which aims to return multiple places that are spatially close to the query location such that each place is relevant to one or more query keywords.
Abstract: The top-k most relevant Semantic Place retrieval (kSP) query on spatial RDF data combines keyword-based and location-based retrieval. The query returns semantic places that are subgraphs rooted at a place entity with an associated location. The relevance to the query keywords of a semantic place is measured by a looseness score that aggregates the graph distances between the place (root) and the occurrences of the keywords in the nodes of the tree. We observe that kSP queries may retrieve semantic places that are spatially close to the query location, but with very low keyword relevance. When any single nearby place has low relevance, returning instead multiple relevant places maybe helpful. Hence, we propose a generalization of semantic place retrieval, namely semantic region (SR) retrieval. An SR query aims to return multiple places that are spatially close to the query location such that each place is relevant to one or more query keywords. An algorithm and optimization techniques are proposed for the efficient processing of SR queries. Extensive empirical studies with two real datasets offer insight into the performance of the proposals.

2 citations


Journal ArticleDOI
TL;DR: This special issue of the GeoInformatica journal covers recent advances in spatio-temporal data management and analytics in the context of smart city and urban computing and proposes a pivot-based hierarchical indexing structure to integrate spatial and semantic information in a seamless way.
Abstract: This special issue of the GeoInformatica journal covers recent advances in spatio-temporal data management and analytics in the context of smart city and urban computing. It contains 11 articles that present solid research studies and innovative ideas in the area of spatiotemporal data management for smart city research. All of the 11 papers went through several rounds of rigorous reviews by the guest editors and invited reviewers. Geo-textual query processing has been receiving much attention in area of spatiotemporal data management. The paper, by Xinyu Chen et al., “S2R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search,” proposes a pivot-based hierarchical indexing structure to integrate spatial and semantic information in a seamless way. The proposed index is able to return accurate query results that take semantic meaning of geo-textual objects into consideration. Another paper, by Zhongpu Chen et al., “ITISS: an efficient framework for querying big temporal data,” proposes an inmemory based two-level index structure in Spark, which is easily understood and implemented, but without loss of effectiveness and efficiency. Additionally, the paper, by Xiaozhao Song et al., “Collective spatial keyword search on activity trajectories,” presents an effective and efficient collective spatial keyword query processing algorithm GeoInformatica https://doi.org/10.1007/s10707-020-00397-9

Posted Content
TL;DR: This paper proposes a Spatial-Temporal Graph Convolutional Sequential Learning algorithm that predicts the service requests across locations and time slots and develops a demand-aware route planning algorithm that considers both the spatial-temporal predictions and the supplydemand state.
Abstract: We consider a setting with an evolving set of requests for transportation from an origin to a destination before a deadline and a set of agents capable of servicing the requests. In this setting, an assignment authority is to assign agents to requests such that the average idle time of the agents is minimized. An example is the scheduling of taxis (agents) to meet incoming requests for trips while ensuring that the taxis are empty as little as possible. In this paper, we study the problem of spatial-temporal demand forecasting and competitive supply (SOUP). We address the problem in two steps. First, we build a granular model that provides spatial-temporal predictions of requests. Specifically, we propose a Spatial-Temporal Graph Convolutional Sequential Learning (ST-GCSL) algorithm that predicts the service requests across locations and time slots. Second, we provide means of routing agents to request origins while avoiding competition among the agents. In particular, we develop a demand-aware route planning (DROP) algorithm that considers both the spatial-temporal predictions and the supplydemand state. We report on extensive experiments with realworld and synthetic data that offer insight into the performance of the solution and show that it is capable of outperforming the state-of-the-art proposals.