scispace - formally typeset
Search or ask a question

Showing papers by "Christian S. Jensen published in 2016"


Journal ArticleDOI
TL;DR: A novel Collective Travel Planning (CTP) query that finds the lowest-cost route connecting multiple sources and a destination, via at most $k$ meeting points, is proposed and investigated.
Abstract: Travel planning and recommendation are important aspects of transportation. We propose and investigate a novel Collective Travel Planning (CTP) query that finds the lowest-cost route connecting multiple sources and a destination, via at most $k$ meeting points. When multiple travelers target the same destination (e.g., a stadium or a theater), they may want to assemble at meeting points and then go together to the destination by public transport to reduce their global travel cost (e.g., energy, money, or greenhouse-gas emissions). This type of functionality holds the potential to bring significant benefits to society and the environment, such as reducing energy consumption and greenhouse-gas emissions, enabling smarter and greener transportation, and reducing traffic congestions. The CTP query is Max SNP-hard. To compute the query efficiently, we develop two algorithms, including an exact algorithm and an approximation algorithm. The exact algorithm is capable finding the optimal result for small values of $k$ (e.g., $k = 2$ ) in interactive time, while the approximation algorithm, which has a $5$ -approximation ratio, is suitable for other situations. The performance of the CTP query is studied experimentally with real and synthetic spatial data.

90 citations


Journal ArticleDOI
01 Nov 2016
TL;DR: This work proposes a new paradigm, the hybrid graph, that targets more accurate and more efficient path cost distribution estimation, and shows how the resulting method for computing cost distributions of paths can be integrated into existing routing algorithms.
Abstract: With the growing volumes of vehicle trajectory data, it becomes increasingly possible to capture time-varying and uncertain travel costs in a road network, including travel time and fuel consumption. The current paradigm represents a road network as a weighted graph; it blasts trajectories into small fragments that fit the under-lying edges to assign weights to edges; and it then applies a routing algorithm to the resulting graph. We propose a new paradigm, the hybrid graph, that targets more accurate and more efficient path cost distribution estimation. The new paradigm avoids blasting trajectories into small fragments and instead assigns weights to paths rather than simply to the edges.We show how to compute path weights using trajectory data while taking into account the travel cost dependencies among the edges in the paths. Given a departure time and a query path, we show how to select an optimal set of weights with associated paths that cover the query path and such that the weights enable the most accurate joint cost distribution estimation for the query path. The cost distribution of the query path is then computed accurately using the joint distribution. Finally, we show how the resulting method for computing cost distributions of paths can be integrated into existing routing algorithms. Empirical studies with substantial trajectory data from two different cities offer insight into the design properties of the proposed method and confirm that the method is effective in real-world settings.

72 citations


Proceedings ArticleDOI
01 May 2016
TL;DR: This work proposes techniques capable of adapting an initial set of query keywords so that expected, but missing, objects enter the result along with other relevant objects, and develops a basic algorithm with a set of optimizations that sequentially examines a sequence of candidate keyword sets.
Abstract: Web objects, often associated with descriptive text documents, are increasingly being geo-tagged. A spatial keyword top-k query retrieves the best k such objects according to a scoring function that considers both spatial distance and textual similarity. However, it is in some cases difficult for users to identify the exact keywords that describe their query intent. After a user issues an initial query and gets back the result, the user may find that some expected objects are missing and may wonder why. Answering the resulting why-not questions can aid users in retrieving better results. However, no existing techniques are able to answer why-not questions by adapting the query keywords. We propose techniques capable of adapting an initial set of query keywords so that expected, but missing, objects enter the result along with other relevant objects. We develop a basic algorithm with a set of optimizations that sequentially examines a sequence of candidate keyword sets. In addition, we present an index-based bound-and-prune algorithm that is able to determine the best sample out of a set of candidates in just one pass of index traversal, thus speeding up the query processing. We also extend the proposed algorithms to handle multiple missing objects. Extensive experimental results offer insight into the efficiency of the proposed techniques in terms of running time and I/O cost.

41 citations


Proceedings ArticleDOI
26 Jun 2016
TL;DR: The tutorial explores topics such as continuous queries on streaming geo-textual data, queries that retrieve attractive regions of geo- textual objects, and queries that extract properties, e.g., topics and top-$k$ frequent words, of the objects in regions.
Abstract: Over the past decade, we have moved from a predominantly desktop based web to a predominantly mobile web, where users most often access the web from mobile devices such as smartphones. In addition, we are witnessing a proliferation of geo-located, textual web content. Motivated in part by these developments, the research community has been hard at work enabling the efficient computation of a variety of query functionality on geo-textual data, yielding a sizable body of literature on the querying of geo-textual data. With a focus on different types of keyword-based queries on geo-textual data, the tutorial also explores topics such as continuous queries on streaming geo-textual data, queries that retrieve attractive regions of geo-textual objects, and queries that extract properties, e.g., topics and top-$k$ frequent words, of the objects in regions. The tutorial is designed to offer an overview of the problems addressed in this body of literature and offers an overview of pertinent concepts and techniques. In addition, the tutorial suggests open problems and new research direction.

36 citations


Journal ArticleDOI
TL;DR: This work proposes dynamic network summarization to summarize dynamic networks with millions of nodes by only capturing the few most interesting nodes or edges overtime, and proposes OSNet, an online summarization framework for dynamic networks.
Abstract: Information diffusion in social networks is often characterized by huge participating communities and viral cascades of high dynamicity. To observe, summarize, and understand the evolution of dynamic diffusion processes in an informative and insightful way is a challenge of high practical value. However, few existing studies aim to summarize networks for interesting dynamic patterns. Dynamic networks raise new challenges not found in static settings, including time sensitivity, online interestingness evaluation, and summary traceability, which render existing techniques inadequate. We propose dynamic network summarization to summarize dynamic networks with millions of nodes by only capturing the few most interesting nodes or edges over time. Based on the concepts of diffusion radius and scope, we define interestingness measures for dynamic networks, and we propose $\sf {OSNet}$ , an online summarization framework for dynamic networks. Efficient algorithms are included in $\sf {OSNet}$ . We report on extensive experiments with both synthetic and real-life data. The study offers insight into the effectiveness, efficiency, and design properties of $\sf {OSNet}$ .

34 citations


Journal ArticleDOI
01 Aug 2016
TL;DR: This work proposes an infrastructure, Elite, that leverages peer-to-peer and parallel computing techniques to address key challenges posed by spatiotemporal data, and offers advanced functionality, including probabilistic simulations, for contending with the inaccuracy of the underlying data in query processing.
Abstract: As the volumes of spatiotemporal trajectory data continue to grow at a rapid pace; a new generation of data management techniques is needed in order to be able to utilize these data to provide a range of data-driven services, including geographic-type services. Key challenges posed by spatiotemporal data include the massive data volumes, the high velocity with which the data are captured, the need for interactive response times, and the inherent inaccuracy of the data. We propose an infrastructure, Elite, that leverages peer-to-peer and parallel computing techniques to address these challenges. The infrastructure offers efficient, parallel update and query processing by organizing the data into a layered index structure that is logically centralized, but physically distributed among computing nodes. The infrastructure is elastic with respect to storage, meaning that it adapts to fluctuations in the storage volume, and with respect to computation, meaning that the degree of parallelism can be adapted to best match the computational requirements. Further, the infrastructure offers advanced functionality, including probabilistic simulations, for contending with the inaccuracy of the underlying data in query processing. Extensive empirical studies offer insight into properties of the infrastructure and indicate that it meets its design goals, thus enabling the effective management of big spatiotemporal data.

34 citations


Journal ArticleDOI
TL;DR: This article demonstrates how it is possible to extend the relational database engine to achieve a full-fledged, industrial-strength implementation of sequenced temporal queries, which intuitively are queries that are evaluated at each time point.
Abstract: Many databases contain temporal, or time-referenced, data and use intervals to capture the temporal aspect While SQL-based database management systems (DBMSs) are capable of supporting the management of interval data, the support they offer can be improved considerably A range of proposed temporal data models and query languages offer ample evidence to this effect Natural queries that are very difficult to formulate in SQL are easy to formulate in these temporal query languages The increased focus on analytics over historical data where queries are generally more complex exacerbates the difficulties and thus the potential benefits of a temporal query language Commercial DBMSs have recently started to offer limited temporal functionality in a step-by-step manner, focusing on the representation of intervals and neglecting the implementation of the query evaluation engineThis article demonstrates how it is possible to extend the relational database engine to achieve a full-fledged, industrial-strength implementation of sequenced temporal queries, which intuitively are queries that are evaluated at each time point Our approach reduces temporal queries to nontemporal queries over data with adjusted intervals, and it leaves the processing of nontemporal queries unaffected Specifically, the approach hinges on three concepts: interval adjustment, timestamp propagation, and attribute scaling Interval adjustment is enabled by introducing two new relational operators, a temporal normalizer and a temporal aligner, and the latter two concepts are enabled by the replication of timestamp attributes and the use of so-called scaling functions By providing a set of reduction rules, we can transform any temporal query, expressed in terms of temporal relational operators, to a query expressed in terms of relational operators and the two new operators We prove that the size of a transformed query is linear in the number of temporal operators in the original query An integration of the new operators and the transformation rules, along with query optimization rules, into the kernel of PostgreSQL is reported Empirical studies with the resulting temporal DBMS are covered that offer insights into pertinent design properties of the article's proposal The new system is available as open-source software

33 citations


Proceedings ArticleDOI
31 Oct 2016
TL;DR: SPNET is believed to be the first in-memory index for network-constrained trajectory data and to exploit the main-memory setting SPNET exploits efficient shortest-path compression of trajectories to achieve a compact index structure.
Abstract: With the decreasing cost and growing size of main memory, it is increasingly relevant to utilize main-memory indexing for efficient query processing. We propose SPNET, which we believe is the first in-memory index for network-constrained trajectory data. To exploit the main-memory setting SPNET exploits efficient shortest-path compression of trajectories to achieve a compact index structure. SPNET is capable of exploiting the parallel computing capabilities of modern machines and supports both intra- and inter-query parallelism. The former improves response time, and the latter improves throughput. By design, SPNET supports a wider range of query types than any single existing index. An experimental study in a real-world setting with 1.94 billion GPS records and nearly 4 million trajectories in a road network with 1.8 million edges indicates that SPNET typically offers performance improvements over the best existing indexes of 1.5 to 2 orders of magnitude.

25 citations


Proceedings ArticleDOI
01 Jan 2016
TL;DR: This paper studies two query types for finding frequently visited Points of Interest (POIs) from symbolic indoor tracking data, and provides uncertainty analyses of the data in relation to the two kinds of queries.
Abstract: Indoor tracking data is being amassed due to the deployment of indoor positioning technologies. Analysing such data discloses useful insights that are otherwise hard to obtain. For example, by studying tracking data from an airport, we can identify the shops and restaurants that are most popular among passengers. In this paper, we study two query types for finding frequently visited Points of Interest (POIs) from symbolic indoor tracking data. The snapshot query finds those POIs that were most frequently visited at a given time point, whereas the interval query finds such POIs for a given time interval. A typical example of symbolic tracking is RFID-based tracking, where an object with an RFID tag is detected by an RFID reader when the object is in the reader’s detection range. A symbolic indoor tracking system deploys a limited number of proximity detection devices, like RFID readers, at preselected locations, covering only part of the host indoor space. Consequently, symbolic tracking data is inherently uncertain and only enables the discrete capture of the trajectories of indoor moving objects in terms of coarse regions. We provide uncertainty analyses of the data in relation to the two kinds of queries. The outcomes of the analyses enable us to design processing algorithms for both query types. An experimental evaluation with both real and synthetic data suggests that the framework and algorithms enable efficient and scalable query processing.

22 citations


Proceedings ArticleDOI
31 Oct 2016
TL;DR: This work designs an A based framework that utilizes the uncertain graph to obtain the most accurate cost distributions while finding the candidate paths, and proposes a three-stage dominance examination method that employs extreme values in each candidate path's cost distribution for early detection of dominated paths, thus reducing the need for expensive distributions convolutions.
Abstract: With the rapidly growing availability of vehicle trajectory data, travel costs such as travel time and fuel consumption can be captured accurately as distributions (e.g., travel time distributions) instead of deterministic values (e.g., average travel times). We study a new path finding problem in uncertain road networks, where paths have travel cost distributions. Given a source and a destination, we find optimal, non-dominated paths connecting the source and the destination, where the optimality is defined in terms of the stochastic dominance among cost distributions of paths. We first design an A based framework that utilizes the uncertain graph to obtain the most accurate cost distributions while finding the candidate paths. Next, we propose a three-stage dominance examination method that employs extreme values in each candidate path's cost distribution for early detection of dominated paths, thus reducing the need for expensive distributions convolutions. We conduct extensive experiments using real world road network and trajectory data. The results show that our algorithm outperforms baseline algorithms by up to two orders of magnitude in terms of query response time while achieving the most accurate results.

21 citations


Proceedings ArticleDOI
16 May 2016
TL;DR: This work defines and offers solutions to why-not questions on MPRQ, and proposes a framework that consists of three efficient solutions, one that modifies the original query,One that modifying the why- not set, and one thatModifies both the original queries and theWhy-not set.
Abstract: Metric probabilistic range queries (MPRQ) have received substantial attention due to their utility in multimedia and text retrieval, decision making, etc. Existing MPRQ studies generally aim to improve query efficiency and resource usage. In contrast, we define and offer solutions to why-not questions on MPRQ. Given an original metric probabilistic range query and a why-not set W of uncertain objects that are absent from the query result, a why-not question on MPRQ explains why the uncertain objects in W do not appear in the query result, and provides refinements of the original query and/or W with the minimal penalty, so that the uncertain objects in W appear in the result of the refined query. Specifically, we propose a framework that consists of three efficient solutions, one that modifies the original query, one that modifies the why-not set, and one that modifies both the original query and the why-not set. Extensive experiments using both real and synthetic data sets offer insights into the properties of the proposed algorithms, and show that they are effective and efficient.

Journal ArticleDOI
01 Oct 2016
TL;DR: A read/write-optimized index is constructed that is capable of offering better overall performance than previous flash-aware indices and can be balanced by only tuning the false-positive rate of the Bloom filters.
Abstract: Flash-memory-based solid-state drives (SSDs) are used widely for secondary storage. To be effective for SSDs, traditional indices have to be redesigned to cope with the special properties of flash memory, such as asymmetric read/write latencies (fast reads and slow writes) and out-of-place updates. Previous flash-optimized indices focus mainly on reducing random writes to SSDs, which is typically accomplished at the expense of a substantial number of extra reads. However, modern SSDs show a narrowing gap between read and write speeds, and read operations on SSDs increasingly affect the overall performance of indices on SSDs. As a consequence, how to optimize SSD-aware indices by reducing both write and read costs is a pertinent and open challenge. We propose a new tree index for SSDs that is able to reduce both writes and extra reads. In particular, we use an update buffer and overflow pages to reduce random writes, and we further exploit Bloom filters to reduce the extra reads to the overflow nodes in the tree. With this mechanism, we construct a read/write-optimized index that is capable of offering better overall performance than previous flash-aware indices. In addition, we present an analysis of the proposed index and show that the read and write costs of the operations on the index can be balanced by only tuning the false-positive rate of the Bloom filters. Our experimental results suggest that our proposal is efficient and represents an improvement over existing methods.

Journal ArticleDOI
01 Feb 2016
TL;DR: The key challenge addressed is that of scalably providing up-to-date results to queries when the query locations change continuously, which is achieved by the proposal of a new so-called safe-zone model.
Abstract: We provide techniques that enable a scalable so-called Volunteered Geographic Services system. This system targets the increasing populations of online mobile users, e.g., smartphone users, enabling such users to provide location-based services to each other, thus enabling citizen reporter or citizen as a sensor scenarios. More specifically, the system allows users to register as service volunteers, or micro-service providers, by accepting service descriptions and periodically updated locations from such volunteers; and the system allows users to subscribe to notifications of available, nearby relevant services by accepting subscriptions, formalized as continuous queries, that take service preferences and user locations as arguments and return relevant services. Services are ranked according to their relevance and distance to a query, and the highest ranked services are returned. The key challenge addressed is that of scalably providing up-to-date results to queries when the query locations change continuously. This is achieved by the proposal of a new so-called safe-zone model. With safe zones, query results are accompanied by safe zones with the property that a query result remains the same for all locations in its safe zone. Then, query users need only notify the system when they exit their current safe zone. Existing safe-zone models fall short in the papers setting. The new model is enabled by (i) weighted and (ii) set weighted imprecise Voronoi cells. The paper covers underlying concepts, properties, and algorithms, and it covers applications in VGS tracking and presents findings of empirical performance studies.

Journal ArticleDOI
01 Sep 2016
TL;DR: YASK is presented, a system capable of answering why-not questions posed in response to answers to spatial keyword top-k queries, and two explanation and query refinement models, namely preference adjustment and keyword adaption are implemented.
Abstract: With the proliferation of the mobile use of the web, spatial keyword query (SKQ) services are gaining in importance. However, state-of-the-art SKQ systems do not provide systematic functionality that allows users to ask why some known object is unexpectedly missing from a query result and do not provide an explanation for such missing objects. In this demonstration, we present a system called YASK, a whY-not question Answering engine for Spatial Keyword query services, that is capable of answering why-not questions posed in response to answers to spatial keyword top-k queries. Two explanation and query refinement models, namely preference adjustment and keyword adaption, are implemented in YASK. The system provides users not only with the reasons why desired objects are missing from query results, but provides also relevant refined queries that revive the expected but missing objects. This demonstration gives attendees hands-on experience with YASK through a map-based GUI interface in which attendees can issue spatial keyword queries, pose why-not questions, and visualize the results.

Proceedings ArticleDOI
24 Oct 2016
TL;DR: In this article, the authors define the top-k spatial textual clusters (k-STC) query, which retrieves spatial textual objects that are near a query location and are relevant to query keywords.
Abstract: Spatial keyword queries retrieve spatial textual objects that are near a query location and are relevant to query keywords. The paper defines the top-k spatial textual clusters (k-STC) query that returns the top-k clusters that are located close to a given query location, contain relevant objects with regard to given query keywords, and have an object density that exceeds a given threshold. This query aims to support users who wish to explore nearby regions with many relevant objects. To compute this query, the paper proposes a basic and an advanced algorithm that rely on on-line density-based clustering. An empirical study offers insight into the performance properties of the proposed algorithms.

Journal ArticleDOI
28 Sep 2016
TL;DR: The academic impact of a researcher can be measured by the number of citations to their papers, which can generally be considered as the more interesting, relevant, important, and/or impactful one.
Abstract: In contrast, the academic impact of the content of a paper can be measured by the number of citations to the paper. In some areas, it is easier to get citations than in other areas. However, when comparing two papers from the same area, one paper with many citations and one paper with few, the former can generally be considered as the more interesting, relevant, important, and/or impactful one. The academic impact of a researcher can then be measured by the number of citations to their papers.

Book ChapterDOI
16 Apr 2016
TL;DR: A new approach to measuring the similarity among indoor moving-object trajectories based on spatial similarity and semantic pattern similarity is proposed, and a hierarchical semantic pattern is proposed to construct to capture the semantics of trajectories.
Abstract: In this paper, we propose a new approach to measuring the similarity among indoor moving-object trajectories. Particularly, we propose to measure indoor trajectory similarity based on spatial similarity and semantic pattern similarity. For spatial similarity, we propose to detect the critical points in trajectories and then use them to determine spatial similarity. This approach can lower the computational costs of similarity search. Moreover, it helps achieve a more effective measure of spatial similarity because it removes noisy points. For semantic pattern similarity, we propose to construct a hierarchical semantic pattern to capture the semantics of trajectories. This method makes it possible to capture the implicit semantic similarity among different semantic labels of locations, and enables more meaningful measures of semantic similarity among indoor trajectories. We conduct experiments on indoor trajectories, comparing our proposal with several popular methods. The results suggest that our proposal is effective and represents an improvement over existing methods.

Journal ArticleDOI
30 Jun 2016
TL;DR: Although impact as measured by citations then differs from excellence, citations are still used for the rating of journals and in some countries, the impact factors of a journal play an important role when different institutions assess the excellence of the journal.

Proceedings ArticleDOI
13 Jun 2016
TL;DR: Fog BAT is presented, a system that combines GPS data with Bluetooth data and shows how the data types are aligned to ensure that data from each sensor type is related to the exact same part of the road network and cover the same time period.
Abstract: Congestion is a major problem in many cities. In order to monitor and manage traffic, a number of different sensor types are used to collect traffic data. This includes GPS devices in the vehicles themselves as well as fixed Bluetooth sensors along the roads. Each sensor type has advantages and disadvantages. Where GPS has a wide coverage of the road network, Bluetooth sensors gather data from a much higher number of vehicles. In this paper we present Fog BAT, a system that combines GPS data with Bluetooth data. The goal of the system is to retain the advantages of both. We show how the data types are aligned to ensure that data from each sensor type is related to the exact same part of the road network and cover the same time period. Using very large real-world data sets, we use the system to compare travel speeds based on each data type, and how the use of both data types simultaneously can improve the accuracy of computed travel speeds and congestion levels.

Posted Content
TL;DR: The paper defines the top-k spatial textual clusters (k-STC) query, and proposes a basic and an advanced algorithm that rely on on-line density-based clustering.
Abstract: Keyword-based web queries with local intent retrieve web content that is relevant to supplied keywords and that represent points of interest that are near the query location. Two broad categories of such queries exist. The first encompasses queries that retrieve single spatial web objects that each satisfy the query arguments. Most proposals belong to this category. The second category, to which this paper's proposal belongs, encompasses queries that support exploratory user behavior and retrieve sets of objects that represent regions of space that may be of interest to the user. Specifically, the paper proposes a new type of query, namely the top-k spatial textual clusters (k-STC) query that returns the top-k clusters that (i) are located the closest to a given query location, (ii) contain the most relevant objects with regard to given query keywords, and (iii) have an object density that exceeds a given threshold. To compute this query, we propose a basic algorithm that relies on on-line density-based clustering and exploits an early stop condition. To improve the response time, we design an advanced approach that includes three techniques: (i) an object skipping rule, (ii) spatially gridded posting lists, and (iii) a fast range query algorithm. An empirical study on real data demonstrates that the paper's proposals offer scalability and are capable of excellent performance.

01 Dec 2016
TL;DR: En åbenlys fordel ved GPS er, at den gør det muligt at beregne køretider for det meste af vejnetværket, herunder de mest trafikerede veje
Abstract: En åbenlys fordel ved GPS er, at den gør det muligt at beregne køretider for det meste af vejnetværket, herunder de mest trafikerede veje. En væsentlig ulempe ved at benytte GPS er, at data kun modtages fra en forholdsvis lille del af køretøjerne sammenlignet med de andre teknologier. I denne artikel vil vi kort præsentere, hvordan vi håndterer meget store mængder af GPS data fra køretøjer til beregning af køretider. Vi fokuserer på beregning af køretider i kryds og på strækninger. GPS data er specielt interessante i disse sammenhænge, fordi man med såkaldte turdata kan følge det enkelte køretøj meget nøjagtigt, når det fx laver et venstresving i et kryds. Dette kræver, at GPS dataene er opsamlet med høj frekvens. Dette er gældende for langt hovedparten af de data, vi har, hvor den typiske frekvens er en måling pr. sekund. Grundet pladshensyn rummer artiklen ikke en sammenligning af køretider beregnet vha. GPS med køretider beregnet vha. fx Bluetooth eller spoler. Vi henviser i stedet til eksisterende arbejde (Andersen, Lahrmann, & Torp, 2011) (Borresen, Jensen, & Torp, 2016).

Proceedings ArticleDOI
13 Jun 2016
TL;DR: Crowd Rank Eval uses crowd sourcing for synthesizing results to top-k queries and is able to visualize the results and to compare them to the results obtained from ranking functions, thus offering insight into the ranking functions.
Abstract: We demonstrate Crowd Rank Eval, a novel framework for the evaluation of ranking functions for top-k spatial keyword queries. The framework enables researchers to study hypotheses regarding ranking functions. Crowd Rank Eval uses crowd sourcing for synthesizing results to top-k queries and is able to visualize the results and to compare them to the results obtained from ranking functions, thus offering insight into the ranking functions.