scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Efficient retrieval of the top-k most relevant spatial web objects

01 Aug 2009-Vol. 2, Iss: 1, pp 337-348
TL;DR: A new indexing framework for location-aware top-k text retrieval that encompasses algorithms that utilize the proposed indexes for computing the top- k query, thus taking into account both text relevancy and location proximity to prune the search space.
Abstract: The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account.This paper proposes a new indexing framework for location-aware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper's proposal offers scalability and is capable of excellent performance.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Sep 2010
TL;DR: A general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from GPS data is proposed and is capable of outperforming baseline methods and an extension of an existing proposal.
Abstract: With the increasing deployment and use of GPS-enabled devices, massive amounts of GPS data are becoming available. We propose a general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from such data.We present techniques capable of extracting semantic locations from GPS data. We capture the relationships between locations and between locations and users with a graph. Significance is then assigned to locations using random walks over the graph that propagates significance among the locations. In doing so, mutual reinforcement between location significance and user authority is exploited for determining significance, as are aspects such as the number of visits to a location, the durations of the visits, and the distances users travel to reach locations. Studies using up to 100 million GPS records from a confined spatio-temporal region demonstrate that the proposal is effective and is capable of outperforming baseline methods and an extension of an existing proposal.

339 citations


Cites background from "Efficient retrieval of the top-k mo..."

  • ...Another example is that the outcome can be combined with the socalled location-aware keyword query [6]....

    [...]

Journal ArticleDOI
01 Jan 2013
TL;DR: An all-around survey of 12 state-of-the-art geo-textual indices and proposes a benchmark that enables the comparison of the spatial keyword query performance, thus uncovering new insights that may guide index selection as well as further research.
Abstract: Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all-around survey of 12 state-of-the-art geo-textual indices. We propose a benchmark that enables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the benchmark to the indices, thus uncovering new insights that may guide index selection as well as further research.

323 citations


Cites background or methods from "Efficient retrieval of the top-k mo..."

  • ...IR2-Tree [9] IR2 R-Tree bitmaps tightly combined √ △ IR-Tree [7, 20] IR R-Tree inverted file tightly combined △ √ △ IR-Tree [16] IRLi R-Tree inverted file tightly combined √...

    [...]

  • ...The construction of the CDIR-tree involves two parameters....

    [...]

  • ...The DIRtree [7, 20] takes both spatial and textual information into account during the tree construction by optimizing for a combination of minimizing the areas of MBRs and maximizing the text similarities between the objects of the enclosing rectangles....

    [...]

  • ...To distinguish it from the IR-tree in references [7,20], we refer to it as the IRLi-tree....

    [...]

  • ...Several variants of the IR-tree exist, which optimize the IR-tree, including the DIR-tree, the CIR-tree, and the CDIR-tree....

    [...]

Proceedings ArticleDOI
12 Jun 2011
TL;DR: This paper defines the problem of retrieving a group of spatial web objects such that the group's keywords cover the query's keywords and such that objects are nearest to the query location and have the lowest inter-object distances and designs exact and approximate solutions with provable approximation bounds to the problems.
Abstract: With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query's keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We present empirical studies that offer insight into the efficiency and accuracy of the solutions.

320 citations


Cites background or methods from "Efficient retrieval of the top-k mo..."

  • ...The IR-tree [8] is essentially an R-tree [12] extended with inverted files [16]....

    [...]

  • ...This development gives prominence to spatial keyword queries [5, 6, 8, 10]....

    [...]

  • ...This algorithm utilizes a spatial-keyword index such as the IRtree [8] to prune the search space....

    [...]

  • ...Several recently proposed hybrid indexes [5, 8, 10, 14, 15] that tightly integrate spatial indexing (e....

    [...]

  • ...We use the IR-tree [8], covered in Section 3....

    [...]

Book ChapterDOI
24 Aug 2011
TL;DR: A novel index to improve the performance of top-k spatial keyword queries named Spatial Inverted Index (S2I), which maps each distinct term to a set of objects containing the term and can be retrieved efficiently in decreasing order of keyword relevance and spatial proximity.
Abstract: Given a spatial location and a set of keywords, a top-k spatial keyword query returns the k best spatio-textual objects ranked according to their proximity to the query location and relevance to the query keywords. There are many applications handling huge amounts of geotagged data, such as Twitter and Flickr, that can benefit from this query. Unfortunately, the state-of-the-art approaches require non-negligible processing cost that incurs in long response time. In this paper, we propose a novel index to improve the performance of top-k spatial keyword queries named Spatial Inverted Index (S2I). Our index maps each distinct term to a set of objects containing the term. The objects are stored differently according to the document frequency of the term and can be retrieved efficiently in decreasing order of keyword relevance and spatial proximity. Moreover, we present algorithms that exploit S2I to process top-k spatial keyword queries efficiently. Finally, we show through extensive experiments that our approach outperforms the state-of-the-art approaches in terms of update and query cost.

246 citations

Journal ArticleDOI
01 Sep 2010
TL;DR: Empirical studies with real-world spatial data demonstrate that LkPT queries are more effective in retrieving web objects than a previous approach that does not consider the effects of nearby objects; and they show that the proposed algorithms are scalable and outperform a baseline approach significantly.
Abstract: The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g., Yellow Pages and Maps services. Previous work considers the potential results of such a query as being independent when ranking them. However, a relevant result object with nearby objects that are also relevant to the query is likely to be preferable over a relevant object without relevant nearby objects.The paper proposes the concept of prestige-based relevance to capture both the textual relevance of an object to a query and the effects of nearby objects. Based on this, a new type of query, the Location-aware top-k Prestige-based Text retrieval (LkPT) query, is proposed that retrieves the top-k spatial web objects ranked according to both prestige-based relevance and location proximity.We propose two algorithms that compute LkPT queries. Empirical studies with real-world spatial data demonstrate that LkPT queries are more effective in retrieving web objects than a previous approach that does not consider the effects of nearby objects; and they show that the proposed algorithms are scalable and outperform a baseline approach significantly.

175 citations


Cites background or methods from "Efficient retrieval of the top-k mo..."

  • ...We extend the IR-tree [8] index structure to organize spatial objects and capture the pre-computed information needed for upper bound estimation....

    [...]

  • ...Specifically, we organize the spatial objects by extending the external memory IR-tree [8]....

    [...]

  • ...Recent studies [7, 8, 10, 21, 22] on geographical retrieval address the problem of spatial keyword search....

    [...]

  • ...We follow existing work [8,20] and use a linear combination of the normalized factors for ranking an object o with respect to a query Q:...

    [...]

  • ...The extended IR-tree used in this paper and the original IRtree [8] share a similar data structure....

    [...]

References
More filters
Book
01 Jan 1979
TL;DR: The second edition of a quarterly column as discussed by the authors provides a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book "Computers and Intractability: A Guide to the Theory of NP-Completeness,” W. H. Freeman & Co., San Francisco, 1979.
Abstract: This is the second edition of a quarterly column the purpose of which is to provide a continuing update to the list of problems (NP-complete and harder) presented by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’’ W. H. Freeman & Co., San Francisco, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed. Readers having results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.), or open problems they would like publicized, should send them to David S. Johnson, Room 2C355, Bell Laboratories, Murray Hill, NJ 07974, including details, or at least sketches, of any new proofs (full papers are preferred). In the case of unpublished results, please state explicitly that you would like the results mentioned in the column. Comments and corrections are also welcome. For more details on the nature of the column and the form of desired submissions, see the December 1981 issue of this journal.

40,020 citations

Book
15 May 1999
TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
Abstract: From the Publisher: This is a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective. The advent of the Internet and the enormous increase in volume of electronically stored information generally has led to substantial work on IR from the computer science perspective - this book provides an up-to-date student oriented treatment of the subject.

9,923 citations

Proceedings ArticleDOI
01 Jun 1984
TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.
Abstract: In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an index mechanism that will help it retrieve data items quickly according to their spatial locations However, traditional indexing methods are not well suited to data objects of non-zero size located m multi-dimensional spaces In this paper we describe a dynamic index structure called an R-tree which meets this need, and give algorithms for searching and updating it. We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems in spatial applications

7,336 citations


"Efficient retrieval of the top-k mo..." refers background or methods in this paper

  • ...The IRtree is constructed by means of an insert operation that is adapted from the corresponding R-tree operation [13]....

    [...]

  • ...In this solution, an R-tree [13] indexes the points, potential nearest neighbors are maintained in a priority queue, and the tree is traversed according to a number of heuristics....

    [...]

  • ...It uses a standard implementation of the R-tree [13] with operations ChooseLeaf and Split....

    [...]

  • ...We incorporate document similarity into the standard Quadratic Split algorithm [13]....

    [...]

  • ...The R-tree [13] is arguably the dominant index for spatial queries, and the inverted file is the most efficient index for text information retrieval [33]....

    [...]

Proceedings ArticleDOI
01 May 1990
TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.
Abstract: The R-tree, one of the most popular access methods for rectangles, is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. By running numerous experiments in a standardized testbed under highly varying data, queries and operations, we were able to design the R*-tree which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory. Using our standardized testbed in an exhaustive performance comparison, it turned out that the R*-tree clearly outperforms the existing R-tree variants. Guttman's linear and quadratic R-tree and Greene's variant of the R-tree. This superiority of the R*-tree holds for different types of queries and operations, such as map overlay, for both rectangles and multidimensional points in all experiments. From a practical point of view the R*-tree is very attractive because of the following two reasons 1 it efficiently supports point and spatial data at the same time and 2 its implementation cost is only slightly higher than that of other R-trees.

4,686 citations


"Efficient retrieval of the top-k mo..." refers methods in this paper

  • ...The best approach according to their experiments is to build an R*-tree for each distinct keyword on the web pages containing the keyword....

    [...]

  • ...Hjaltason and Samet [15] propose an incremental nearest neighbor algorithm based on an R*-tree [4]....

    [...]

  • ...As a result, queries with multiple keywords need to access multiple R*-trees and to intersect the results....

    [...]

  • ...The R*-tree: an efficient and robust access method for points and rectangles....

    [...]

  • ...Another hybrid indexing structure [31] combines the R*-tree and bitmap indexing to process the m-closest keyword query that returns the spatially closest objects matching m keywords....

    [...]