scispace - formally typeset
Search or ask a question

Showing papers by "Christian S. Jensen published in 2013"


Journal ArticleDOI
01 Jan 2013
TL;DR: An all-around survey of 12 state-of-the-art geo-textual indices and proposes a benchmark that enables the comparison of the spatial keyword query performance, thus uncovering new insights that may guide index selection as well as further research.
Abstract: Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all-around survey of 12 state-of-the-art geo-textual indices. We propose a benchmark that enables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the benchmark to the indices, thus uncovering new insights that may guide index selection as well as further research.

323 citations


Journal ArticleDOI
01 Jul 2013
TL;DR: This work uses spatio-temporal hidden Markov models (STHMM) to model correlations among different traffic time series to predict travel cost from GPS tracking data from probe vehicles, and provides algorithms that are able to learn the parameters of an STHMM while contending with the sparsity, spatiotemporal correlation, and heterogeneity of the time series.
Abstract: The monitoring of a system can yield a set of measurements that can be modeled as a collection of time series. These time series are often sparse, due to missing measurements, and spatiotemporally correlated, meaning that spatially close time series exhibit temporal correlation. The analysis of such time series offers insight into the underlying system and enables prediction of system behavior. While the techniques presented in the paper apply more generally, we consider the case of transportation systems and aim to predict travel cost from GPS tracking data from probe vehicles. Specifically, each road segment has an associated travel-cost time series, which is derived from GPS data.We use spatio-temporal hidden Markov models (STHMM) to model correlations among different traffic time series. We provide algorithms that are able to learn the parameters of an STHMM while contending with the sparsity, spatio-temporal correlation, and heterogeneity of the time series. Using the resulting STHMM, near future travel costs in the transportation network, e.g., travel time or greenhouse gas emissions, can be inferred, enabling a variety of routing services, e.g., eco-routing. Empirical studies with a substantial GPS data set offer insight into the design properties of the proposed framework and algorithms, demonstrating the effectiveness and efficiency of travel cost inferencing.

120 citations


Proceedings ArticleDOI
03 Jun 2013
TL;DR: A deployment algorithm is designed that identifies and separates such positions into different smaller radio maps by deploying Bluetooth hotspots at particular positions, and improves the Wi-Fi based online position estimation in a divide-and-conquer manner.
Abstract: Reliable indoor positioning is an important foundation for emerging indoor location based services. Most existing indoor positioning proposals rely on a single wireless technology, e.g., Wi-Fi, Bluetooth, or RFID. A hybrid positioning system combines such technologies and achieves better positioning accuracy by exploiting the different capabilities of the different technologies. In a hybrid system based on Wi-Fi and Bluetooth, the former works as the main infrastructure to enable fingerprint based positioning, while the latter (via hotspot devices) partitions the indoor space as well as a large Wi-Fi radio map. As a result, the Wi-Fi based online position estimation is improved in a divide-and-conquer manner. We study three aspects of such a hybrid indoor positioning system. First, to avoid large positioning errors caused by similar reference positions that are hard to distinguish, we design a deployment algorithm that identifies and separates such positions into different smaller radio maps by deploying Bluetooth hotspots at particular positions. Second, we design methods that improve the partition switching that occurs when a user leaves the detection range of a Bluetooth hotspot. Third, we propose three architectural options for placement of the computation workload. We evaluate all proposals using both simulation and walkthrough experiments in two indoor environments of different sizes. The results show that our proposals are effective and efficient in achieving very good indoor positioning performance.

89 citations


Proceedings ArticleDOI
03 Jun 2013
TL;DR: A novel filtering and lifting framework is proposed that augments a standard 2D spatial network model with elevation information extracted from massive aerial laser scan data and thus yields an accurate 3D model.
Abstract: The use of accurate 3D spatial network models can enable substantial improvements in vehicle routing. Notably, such models enable eco-routing, which reduces the environmental impact of transportation. We propose a novel filtering and lifting framework that augments a standard 2D spatial network model with elevation information extracted from massive aerial laser scan data and thus yields an accurate 3D model. We present a filtering technique that is capable of pruning irrelevant laser scan points in a single pass, but assumes that the 2D network fits in internal memory and that the points are appropriately sorted. We also provide an external-memory filtering technique that makes no such assumptions. During lifting, a triangulated irregular network (TIN) surface is constructed from the remaining points. The 2D network is projected onto the TIN, and a 3D network is constructed by means of interpolation. We report on a large-scale empirical study that offers insight into the accuracy, efficiency, and scalability properties of the framework.

84 citations


Proceedings ArticleDOI
03 Jun 2013
TL;DR: The EcoTour system assigns eco-weights to a road network based on GPS and fuel consumption data collected from vehicles to enable ecorouting, and returns the shortest route, the fastest route, and the eco-route, along with statistics for the three routes.
Abstract: Reduction in greenhouse gas emissions from transportation is essential in combating global warming and climate change. Eco-routing enables drivers to use the most eco-friendly routes and is effective in reducing vehicle emissions. The EcoTour system assigns eco-weights to a road network based on GPS and fuel consumption data collected from vehicles to enable ecorouting. Given an arbitrary source-destination pair in Denmark, EcoTour returns the shortest route, the fastest route, and the eco-route, along with statistics for the three routes. EcoTour also serves as a testbed for exploring advanced solutions to a range of challenges related to eco-routing.

73 citations


Journal ArticleDOI
16 Jul 2013
TL;DR: The opinions expressed by well-known database researchers on the future of the relational model and SQL during a panel at the International Workshop on Non-Conventional Data Access (NoCoDa 2012) are reported.
Abstract: We report the opinions expressed by well-known database researchers on the future of the relational model and SQL during a panel at the International Workshop on Non-Conventional Data Access (NoCoDa 2012), held in Florence, Italy in October 2012 in conjunction with the 31st International Conference on Conceptual Modeling. The panelists include: Paolo Atzeni (Universita Roma Tre, Italy), Umeshwar Dayal (HP Labs, USA), Christian S. Jensen (Aarhus University, Denmark), and Sudha Ram (University of Arizona, USA). Quotations from movies are used as a playful though effective way to convey the dramatic changes that database technology and research are currently undergoing.

66 citations


Journal ArticleDOI
TL;DR: A group discovery framework that aims to efficiently support the online discovery of moving objects that travel together is proposed that adopts a sampling-independent approach that makes no assumptions about when positions are sampled, gives no special importance to sampling points, and naturally supports the use of approximate trajectories.
Abstract: GPS-enabled devices are pervasive nowadays. Finding movement patterns in trajectory data stream is gaining in importance. We propose a group discovery framework that aims to efficiently support the online discovery of moving objects that travel together. The framework adopts a sampling-independent approach that makes no assumptions about when positions are sampled, gives no special importance to sampling points, and naturally supports the use of approximate trajectories. The framework's algorithms exploit state-of-the-art, density-based clustering (DBScan) to identify groups. The groups are scored based on their cardinality and duration, and the top-k groups are returned. To avoid returning similar subgroups in a result, notions of domination and similarity are introduced that enable the pruning of low-interest groups. Empirical studies on real and synthetic data sets offer insight into the effectiveness and efficiency of the proposed framework.

52 citations


Proceedings ArticleDOI
18 Mar 2013
TL;DR: Challenges relevant to each context are identified, which are intended to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search.
Abstract: Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated social media data brings significant and diverse rewards across many different application domains, from politics and business to social science and epidemiology.A notable proportion of social media data comes with explicit or implicit spatial annotations, and almost all social media data has temporal metadata. We view social media data as a constant stream of data points, each containing text with spatial and temporal contexts. We identify challenges relevant to each context, which we intend to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search. Finally, for each context, emerging applications and further avenues for investigation are discussed.

50 citations


Proceedings ArticleDOI
03 Jun 2013
TL;DR: It is found that the availability of information about previous trips enables better prediction of route travel time and makes it possible to provide the users with more popular routes than does a conventional navigation service.
Abstract: Mobile location-based services is a very successful class of services that are being used frequently by users with GPS-enabled mobile devices such as smartphones. This paper presents a study of how to exploit GPS trajectory data, which is available in increasing volumes, for the assessment of the quality of one kind of location-based service, namely routing services. Specifically, the paper presents a framework that enables the comparison of the routes provided by routing services with the actual driving behaviors of local drivers. Comparisons include route length, travel time, and also route popularity, which are enabled by common driving behaviors found in available trajectory data. The ability to evaluate the quality of routing services enables service providers to improve the quality of their services and enables users to identify the services that best serve their needs. The paper covers experiments with real vehicle trajectory data and an existing online navigation service. It is found that the availability of information about previous trips enables better prediction of route travel time and makes it possible to provide the users with more popular routes than does a conventional navigation service.

48 citations


Journal ArticleDOI
01 Feb 2013
TL;DR: By carefully using concepts from the field of graphical models, this work is able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy.
Abstract: Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality of the resulting plan is highly dependent on the accuracy of the statistical model that represents the data. It is well known that small errors in the model estimates propagate exponentially through joins, and may result in the choice of a highly sub-optimal query execution plan. Most commercial query optimizers make the attribute value independence assumption: all attributes are assumed to be statistically independent. This reduces the statistical model of the data to a collection of one-dimensional synopses (typically in the form of histograms), and it permits the optimizer to estimate the selectivity of a predicate conjunction as the product of the selectivities of the constituent predicates. However, this independence assumption is more often than not wrong, and is considered to be the most common cause of sub-optimal query execution plans chosen by modern query optimizers. We take a step towards a principled and practical approach to performing cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate that estimation errors can be greatly reduced, leading to orders of magnitude more efficient query execution plans in many cases. Optimization time is kept in the range of tens of milliseconds, making this a practical approach for industrial-strength query optimizers.

46 citations


Journal ArticleDOI
TL;DR: This work proposes two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the server-side computation as well as the communication between the server and the client.
Abstract: Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-k spatial keyword (MkSK) queries over spatial text data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within the safe zone associated with a result. However, existing safe-zone methods focus solely on spatial locations and ignore text relevancy.We propose two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the server-side computation as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space pruning. We present techniques that aim to compute the next safe zone efficiently, and we present two types of conservative safe zones that aim to reduce the communication cost. Empirical studies with real data suggest that the proposals are efficient.To understand the effectiveness of the proposed safe zones, we study analytically the expected area of a safe zone, which indicates on average for how long a safe zone remains valid, and we study the expected number of influence objects needed to define a safe zone, which gives an estimate of the average communication cost. The analytical modeling is validated through empirical studies.

Proceedings ArticleDOI
18 Mar 2013
TL;DR: This work demonstrates how iPark helps ordinary users annotate an existing digital map with two types of parking, on-street parking and parking zones, based on vehicular tracking data, thus enabling effective parking search applications.
Abstract: A wide variety of desktop and mobile Web applications involve geo-tagged content, e.g., photos and (micro-) blog postings. Such content, often called User Generated Geo-Content (UGGC), plays an increasingly important role in many applications. However, a great demand also exists for "core" UGGC where the geo-spatial aspect is not just a tag on other content, but is the primary content, e.g., a city street map with up-to-date road construction data. Along these lines, the iPark system aims to turn volumes of GPS data obtained from vehicles into information about the locations of parking spaces, thus enabling effective parking search applications. In particular, we demonstrate how iPark helps ordinary users annotate an existing digital map with two types of parking, on-street parking and parking zones, based on vehicular tracking data.

01 Jan 2013
TL;DR: Findings from qualitatively studying organisational requirements for indoor Wi-Fi positioning suggest among others a need for supporting all case-specific user groups, providing software platform independence, low maintenance, allowing positioning of all user devices, regardless of platform and form factor.
Abstract: The past decade has witnessed substantial research on methods for indoor Wi-Fi positioning. While much effort has gone into achieving high positioning accuracy and easing fingerprint collection, it is our contention that the general problem is not sufficiently well understood, thus preventing deployments and their usage by applications to become more widespread. Based on our own and published experiences on indoor Wi-Fi positioning deployments, we hypothesize the following: Current indoor WiFi positioning systems and their utilization in applications are hampered by the lack of understanding of the requirements present in the real-world deployments. In this paper, we report findings from qualitatively studying organisational requirements for indoor Wi-Fi positioning. The studied cases and deployments cover both company and public-sector settings and the deployment and evaluation of several types of indoor Wi-Fi positioning systems over durations of up to several years. The findings suggest among others a need for supporting all case-specific user groups, providing software platform independence, low maintenance, allowing positioning of all user devices, regardless of platform and form factor. Furthermore, the findings also vary significantly across organisations, for instance in terms of need for coverage, which motivates the design of orthogonal solutions.

Proceedings ArticleDOI
03 Jun 2013
TL;DR: The paper presents a method for the identification of typical movement patterns among indoor moving objects and takes into account the specifics of spatial movement and, in particular, the particulars of tracking data that captures indoor movement.
Abstract: With the proliferation of mobile computing, positioning systems are becoming available that enable indoor location-based services. As a result, indoor tracking data is also becoming available. This paper puts focus on one use of such data, namely the identification of typical movement patterns among indoor moving objects. Specifically, the paper presents a method for the identification of movement patterns. Leveraging concepts from sequential pattern mining, the method takes into account the specifics of spatial movement and, in particular, the specifics of tracking data that captures indoor movement. For example, the paper's proposal supports spatial aggregation and utilizes the topology of indoor spaces to achieve better performance. The paper reports on empirical studies with real and synthetic data that offer insights into the functional and computational aspects of its proposal.

Journal ArticleDOI
01 Sep 2013
TL;DR: This work proposes techniques that enable efficient computation of lower and upper bounds of the shortest surface distance, which enable faster query processing by eliminating expensive distance computations.
Abstract: With the increasing availability of terrain data, e.g., from aerial laser scans, the management of such data is attracting increasing attention in both industry and academia. In particular, spatial queries, e.g., k-nearest neighbor and reverse nearest neighbor queries, in Euclidean and spatial network spaces are being extended to terrains. Such queries all rely on an important operation, that of finding shortest surface distances. However, shortest surface distance computation is very time consuming. We propose techniques that enable efficient computation of lower and upper bounds of the shortest surface distance, which enable faster query processing by eliminating expensive distance computations. Empirical studies show that our bounds are much tighter than the best-known bounds in many cases and that they enable speedups of up to 43 times for some well-known spatial queries.

Journal ArticleDOI
01 Aug 2013
TL;DR: A practical proposal for finding top-k PoI groups in response to a query is demonstrated and it is shown how problem parameter settings can be mapped to options that are meaningful to users.
Abstract: The notion of point-of-interest (PoI) has existed since paper road maps began to include markings of useful places such as gas stations, hotels, and tourist attractions. With the introduction of geopositioned mobile devices such as smartphones and mapping services such as Google Maps, the retrieval of PoIs relevant to a user's intent has became a problem of automated spatio-textual information retrieval. Over the last several years, substantial research has gone into the invention of functionality and efficient implementations for retrieving nearby PoIs. However, with a couple of exceptions existing proposals retrieve results at single-PoI granularity. We assume that a mobile device user issues queries consisting of keywords and an automatically supplied geo-position, and we target the common case where the user wishes to find nearby groups of PoIs that are relevant to the keywords. Such groups are relevant to users who wish to conveniently explore several options before making a decision such as to purchase a specific product. Specifically, we demonstrate a practical proposal for finding top-k PoI groups in response to a query. We show how problem parameter settings can be mapped to options that are meaningful to users. Further, although this kind of functionality is prone to combinatorial explosion, we will demonstrate that the functionality can be supported efficiently in practical settings.

Proceedings Article
01 Sep 2013
TL;DR: A new class of temporal expression ‐ named temporal expressions ‐ and methods for recognising and interpreting its members are introduced and it is observed that use of the rules improves temporal annotation performance over existing corpora.
Abstract: This paper introduces a new class of temporal expression ‐ named temporal expressions ‐ and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typical, very varied, and difficult to automatically interpret. These indicate dates and times, but are harder to detect because they often do not contain time words and are not used frequently enough to appear in conventional temporally-annotated corpora ‐ for example Michaelmas or Vasant Panchami. Using Wikipedia and linked data, we automatically construct a resource of English named temporal expressions, and use it to extract training examples from a large corpus. These examples are then used to train and evaluate a named temporal expression recogniser. We also introduce and evaluate rules for automatically interpreting these expressions, and we observe that use of the rules improves temporal annotation performance over existing corpora.

Proceedings ArticleDOI
05 Nov 2013
TL;DR: In this article, the authors propose a practical framework for computing this query, where each route traversal is assigned a score that is distributed among the road segments covered by the route according to a score distribution model.
Abstract: Finding a location for a new facility s.t. the facility attracts the maximal number of customers is a challenging problem. Existing studies either model customers as static sites and thus do not consider customer movement, or they focus on theoretical aspects and do not provide solutions that are shown empirically to be scalable. Given a road network, a set of existing facilities, and a collection of customer route traversals, an optimal segment query returns the optimal road network segment(s) for a new facility. We propose a practical framework for computing this query, where each route traversal is assigned a score that is distributed among the road segments covered by the route according to a score distribution model. We propose two algorithms that adopt different approaches to computing the query. Empirical studies with real data sets demonstrate that the algorithms are capable of offering high performance in realistic settings.

Posted Content
TL;DR: This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost.
Abstract: We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.

Proceedings ArticleDOI
TL;DR: This work envisiones a fully organic indoor positioning system, where all available sources of information are exploited in order to provide room-level accuracy with no active intervention of users.
Abstract: Indoor positioning systems based on fingerprinting techniques generally require costly initialization and maintenance by trained surveyors. Organic positioning systems aim to eliminate these deficiencies by managing their own accuracy and obtaining input from users and other sources. Such systems introduce new challenges, e.g., detection and filtering of erroneous user input, estimation of the positioning accuracy, and means of obtaining user input when necessary.We envision a fully organic indoor positioning system, where all available sources of information are exploited in order to provide room-level accuracy with no active intervention of users. For example, such systems can exploit pre-installed cameras to associate a user's location with a Wi-Fi fingerprint from the user's phone; and it can use a calendar to determine whether a user is in the room reported by the positioning system. Numerous possibilities for integration exist that may provide better indoor positioning.

Proceedings ArticleDOI
30 Aug 2013
TL;DR: The paper reviews key concepts related to spatial keywords querying and reviews recent proposals by the author and his colleagues for spatial keyword querying functionality that is easy to use, relevant to users, and can be supported efficiently.
Abstract: The web is increasingly being used by mobile users, and it is increasingly possible to accurately geo-position mobile users. In addition, increasing volumes of geo-tagged web content are becoming available. Further, indications are that a substantial fraction of web keyword queries target local content. When combined, these observations suggest that spatial keyword querying is important and indeed gaining in importance. A prototypical spatial keyword query takes a user location and user-supplied keywords as parameters and returns web content that is spatially and textually relevant to these parameters. The paper reviews key concepts related to spatial keyword querying and reviews recent proposals by the author and his colleagues for spatial keyword querying functionality that is easy to use, relevant to users, and can be supported efficiently.

Proceedings ArticleDOI
03 Jun 2013
TL;DR: An increasingly mobile and spatial Web is fast emerging that enables Web queries with local intent, i.e., keyword-based queries issued by users who are looking for Web content near them, and implies an increasing demand for query functionality that supports local intent.
Abstract: In step with the rapid proliferation of mobile devices with Internet access, the Web is increasingly being access by mobile-device users on the move. Further, it is increasingly possible to accurately geo-position mobile devices, and increasing volumes of geo-positioned content, e.g., Web pages, business directory entries, and microblog posts, are becoming available on the Web. In short, an increasingly mobile and spatial Web is fast emerging. This development enables Web queries with local intent, i.e., keyword-based queries issued by users who are looking for Web content near them. In addition, it implies an increasing demand for query functionality that supports local intent.