scispace - formally typeset
Open AccessJournal ArticleDOI

Optimal Route Queries with Arbitrary Order Constraints

Reads0
Chats0
TLDR
Novel solutions to the general optimal route query are proposed, based on two different methodologies, namely backward search and forward search, in which the route only needs to cover a subset of the given categories.
Abstract
Given a set of spatial points DS, each of which is associated with categorical information, e.g., restaurant, pub, etc., the optimal route query finds the shortest path that starts from the query point (e.g., a home or hotel), and covers a user-specified set of categories (e.g., {pub, restaurant, museum}). The user may also specify partial order constraints between different categories, e.g., a restaurant must be visited before a pub. Previous work has focused on a special case where the query contains the total order of all categories to be visited (e.g., museum → restaurant → pub). For the general scenario without such a total order, the only known solution reduces the problem to multiple, total-order optimal route queries. As we show in this paper, this naive approach incurs a significant amount of repeated computations, and, thus, is not scalable to large data sets. Motivated by this, we propose novel solutions to the general optimal route query, based on two different methodologies, namely backward search and forward search. In addition, we discuss how the proposed methods can be adapted to answer a variant of the optimal route queries, in which the route only needs to cover a subset of the given categories. Extensive experiments, using both real and synthetic data sets, confirm that the proposed solutions are efficient and practical, and outperform existing methods by large margins.

read more

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 1
Optimal Route Queries with Arbitrary Order
Constraints
Jing Li, Yin Yang, Nikos Mamoulis
Abstract—Given a set of spatial points DS, each of which is associated with categorical information, e.g., restaurant, pub, etc., the
optimal route query finds the shortest path that starts from the query point (e.g., a home or hotel), and covers a user-specified set of
categories (e.g., {pub, restaurant, museum}). The user may also specify partial order constraints between different categories, e.g.,
a restaurant must be visited before a pub. Previous work has focused on a special case where the query contains the total order of
all categor ies to be visited (e.g., museum restaurant pub). For the general scenario without such a total order, the only known
solution reduces the problem to multiple, total-order optimal route queries. As we show in this paper, this na
¨
ıve approach incurs a
significant amount of repeated computations, and, thus, is not scalable to large datasets. Motivated by this, we propose novel solutions
to the general optimal route query, based on two different methodologies, namely backward search and forward search. In addition,
we discuss how the proposed methods can be adapted to answer a variant of the optimal route queries, in which the route only needs
to cover a subset of the given categories. Extensive experiments, using both real and synthetic datasets, confirm that the proposed
solutions are efficient and practical, and outperform existing methods by large margins.
Index Terms—H.2.4.h Query processing, H.2.4.k Spatial databases
F
1 INTRODUCTION
Consider a tourist who will have a free day to travel
around Hong Kong. Without much knowledge about the
city, s/he searches online maps to plan for a trip. Usually,
s/he has a fixed starting point, e.g., her/his hotel, and
certain objectives in mind, such as visiting a museum,
dining at a fine restaurant, and enjoying a few drinks at
a local pub. Meanwhile, some destinations may need to
be visited in a certain order. For instance, the trip should
have a pub after a restaurant. The ideal route should
cover all the destinations, satisfy all order constraints,
and minimize the total travel length. Searching for such
a route is captured by the optimal route query [4],
[10], [13], which usually has a vast search space, and,
consequently, is too tedious to be done manually. Cur-
rently, major online map providers have already shown
interest in tools that assist such trip planning tasks.
For example, Google City Tours (citytours.googlelabs.com)
provides suggested tours for a given starting address.
However, these tours are pre-defined, and cannot be
customized according to the user’s plans. Yahoo Travel
(travel.yahoo.com) has a similar service that allows users
to search and share trips, which, unfortunately, cannot
answer optimal route queries either.
Figure 1 illustrates an example optimal route query
J. Li is with the Department of Computer Science, University of Hong
Kong, Pokfulam Road, Hong Kong.
E-mail: jli@cs.hku.hk
Y. Yang is with Advanced Digital Sciences Center, Singapore.
E-mail: yin.yang@adsc.com.sg
N. Mamoulis is with the Department of Computer Science, University of
Hong Kong, Pokfulam Road, Hong Kong.
E-mail: nikos@cs.hku.hk
on a dataset DS with 6 locations p
1
-p
6
. Each location
is associated with one category C
p
, e.g., p
1
, p
2
are mu-
seums; p
3
, p
4
are pubs; and p
5
, p
6
are restaurants. (If a
location belongs to multiple categories, e.g., a restaurant
and pub, we conceptually split it into multiple points
with identical coordinates, each associated to a single
category.) The query contains two parameters: a starting
point q, and a directed acyclic graph G
Q
called the visit
order graph. Each vertex in G
Q
corresponds to a category
and each edge C, C
indicates that a point of category
C must be visited before another of category C
. In our
example, G
Q
signifies that a restaurant must be visited
before a pub. We follow a common assumption that each
category appears at most once in G
Q
[4], [10], [13]. In
addition, to represent the fact that q must be the first
point in the route, we create an artificial category C
q
containing a single point q, and add an edge connecting
C
q
and every other vertex in G
Q
without an in-edge. The
result of the query is the shortest route that visits all cate-
gories in G
Q
, while satisfying the visit order constraints.
In our example, such a route is q p
1
p
5
p
3
. In
practice, the user may not have sufficient time to visit all
the categories. In this situation, a reasonable compromise
is to find a route that covers a subset of l categories from
G
Q
, where l is a user-specified parameter. We call this
variant the size-l optimal route query.
A Greedy algorithm[13] to answer the optimal route
query first finds the nearest neighbor of q that is allowed
to be visited right after q according to G
Q
. In the running
example, Greedy chooses point p
2
(note that p
4
cannot
be selected, since G
Q
requires that a pub is visited after
a restaurant). Then, Greedy adds p
2
to the current route,
and continues to compute the nearest allowable point
according to G
Q
to be added to the route, which is p
5
.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 2
q
museums
restaurants
pubs
p
1
p
4
p
2
p
3
query start point
p
5
p
6
greedy route
optimal route
C
q
={q}
museum
restaurant
pub
visit order graph G
Q
Fig. 1. Example of optimal route query
After that, Greedy finds the nearest allowable point after
p
5
, i.e., p
3
. Since all categories in G
Q
are visited, Greedy
returns the route q p
2
p
5
p
3
. Observe that this
is longer than the optimal route q p
1
p
5
p
3
. The
reason is that although p
2
is closer to q than p
1
, the latter
leads to a shorter sub-route that covers the remaining
categories. In fact, the optimal route query is proven to
be NP-hard [13], and heuristics-based algorithms such
as Greedy cannot guarantee optimality of the result.
Previous work on the optimal route query, e.g., [13],
has mainly focused on a special case where G
Q
de-
fines a total order of categories to be visited. A na
¨
ıve
approach for the general case, where G
Q
is a partial
order, is to enumerate all total orders in G
Q
and process
each of them individually. As we explain in Section
2.1, this method is inefficient as it incurs considerable
repeated work. Motivated by this, we propose sev-
eral efficient solutions to the general-case optimal route
query. Specifically, we investigate two methodologies:
backward search and forward search. The former com-
putes the optimal route from the last point to the first,
while the latter follows the first-to-last order of points.
Furthermore, all proposed solutions extend naturally to
size-l optimal route processing. Extensive experiments,
using large-scale real and synthetic datasets, confirm that
the proposed methods are efficient and practical.
The rest of this paper is organized as follows. Section
2 surveys related work. Section 3 and 4 present solutions
for the optimal route query, following the backward
search and forward search frameworks, respectively.
Section 5 extends the proposed solutions to the size-
l optimal query. Section 6 contains an extensive set of
experiments. Finally, Section 7 concludes the paper.
2 RELATED WORK
Section 2.1 reviews existing solutions to the optimal
route query. Section 2.2 surveys other related queries that
operate on spatial data with categorical information.
2.1 Optimal Route Query Processing
Early work on optimal route computation focuses on
greedy solutions. Chen et al. [4] use the same query
definition as this paper, and propose two heuristics. The
first, namely NNPSR, resembles the greedy approach
described in Section 1; the second retrieves the nearest
point of the query start position q in every category,
and then connects them to form a route. In addition, [4]
also describe a simple combination of NNPSR and R-
LORD [13], which answers a special case of the optimal
route query with a total order of the categories to be
visited. The hybrid solution first runs NNPSR to find
a greedy route; then, it extract the category of each
point on the greedy route, and runs R-LORD with this
category sequence as input. None of the solutions in
[13] guarantees the quality of the results; these meth-
ods usually return sub-optimal routes according to the
experiments in [4]. Li et al. [10] study a variant of the
optimal route query that specifies both a start point
q
start
and an end position q
end
, but no order constraint
between the data categories. This is equivalent to a visit
order graph G
Q
that contains two artificial categories
C
start
= {q
start
} and C
end
= {q
end
}, and two edges
C
start
, C and C, C
end
for each category C in the
dataset. The solutions of [10] report approximate query
results; on the other hand, this paper focuses on efficient,
exact methods for the general optimal route problem.
Sharifzadeh et al. [13] propose R-LORD, the first exact
solution for optimal route queries with a total order. In
the example of Figure 1, suppose that G
Q
specifies total
order q museum restaurant pub; then, R-LORD
is directly applicable. Specifically, let r
be the optimal
route; an important observation made in [13] is that any
suffix r of r
is also the shortest among all routes that
(i) start at the first point of r, and (ii) visit the same
categories as r, in the same order. In our example, the
best answer to the query is r
= q p
1
p
5
p
3
. Its
length-2 suffix p
5
p
3
is the shortest route that starts at
p
5
and visits a restaurant followed by a pub. Similarly,
its length-3 suffix p
1
p
5
p
3
is the shortest path
that originates at p
1
and follows the category sequence
museum restaurant pub. This fact enables dynamic
programming, which gradually fills an optimal suffix table.
In particular, R-LORD first uses a greedy algorithm to
compute a route that satisfies the query, as well as its
length θ. Then, the method computes length-1 optimal
suffixes, which are points from the last category in the
visit order that are within θ-distance to the query start
position q. In our example, R-LORD obtains pubs p
3
and
p
4
, and stores them in the optimal suffix table shown in
Table 1. Next, R-LORD retrieves points from the second-
to-last category that are no farther than θ from q, i.e.,
restaurants p
5
and p
6
, and prepends them to the opti-
mal length-1 suffixes to form optimal length-2 suffixes
p
5
p
3
and p
6
p
4
. Note that p
5
p
4
and p
6
p
3
are discarded, as they have the same starting points
and category sequences as their shorter counterparts
p
5
p
3
and p
6
p
4
, respectively. In the third step, R-
LORD retrieves museums p
1
, p
2
, combines them with the
optimal length-2 suffixes, and obtains optimal length-3
suffixes p
1
p
5
p
3
and p
2
p
5
p
3
. Finally, R-
LORD connects them with q, and selects the shortest one
q p
1
p
5
p
3
as the answer to the query.
During the computation of the optimal suffix table, R-
LORD uses a pruning technique to eliminate sub-routes

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 3
TABLE 1
Optimal suffix table used in R-LORD
Suffix Length Start Point Optimal Suffix
1
p
3
p
3
p
4
p
4
2
p
5
p
5
p
3
p
6
p
6
p
4
3
p
1
p
1
p
5
p
3
p
2
p
2
p
5
p
3
that cannot participate in the optimal solution. Figure 2
illustrates this technique, which we call elliptic pruning.
Suppose that at step i, R-LORD has computed an optimal
sub-route r of length i. Let p
r
be the first point of r,
length(r) be the total length of r, and θ be the length of
the greedy route. Then, at step i + 1, R-LORD connects
r only to points whose total distance to q and p
r
is no
larger than θ length(r). Thus, the range for points al-
lowed to connect to r is an ellipse with foci q and p
r
and
major diameter length(r). For example, in Figure 2(a),
point p
1
is not connected to sub-route r, as the former
falls outside the latter’s ellipse. This is true even when
the combination of p
1
and r leads to an optimal sub-
route of length i + 1. Thus, elliptic pruning reduces the
number of stored optimal sub-routes and, thus, improves
both memory consumption and CPU time. Furthermore,
to minimize I/O costs, R-LORD computes the minimum
bounding rectangle (MBR) of all ellipses generated from
length-i optimal sub-routes, as shown in Figure 2(b), and
uses this MBR as a range query to retrieve points from
the R-tree [9] that indexes the category to be examined
during the (i + 1)
th
step.
q
p
r
p
1
p
2
Sub-route of length i
Cannot connect to r at step i+1
r
(a) with one ellipse
q
p
1
p
2
p
3
p
4
MBR
(b) with multiple ellipses
Fig. 2. Elliptic pruning in R-LORD
PLUB [11] decomposes a general optimal route query
to multiple total-order queries and processes them in-
dividually, e.g., using R-LORD. For instance, the query
in Figure 1 is decomposed into three total-order queries:
museum restaurant pub, restaurant museum
pub, and restaurant pub museum. This incurs
significant amounts of repeated computations for longer
sequences. For example, assume that in the query of Fig-
ure 1 there is an additional category (e.g., mall) that does
not have any order constraints with other categories.
The decomposition of this new query involves multiple
total orders that share a common suffix, such as mall
museum restaurant pub and museum mall
restaurant pub. Consequently, the processing of both
orders involves the computation of optimal sub-routes
that start at a restaurant and are followed by a pub. This
problem is amplified, as the number of categories in G
Q
increases, since the number of total orders that share a
common suffix increases exponentially.
Finally, Chen et al. [6] study the k Best Connected
Trajectories (k-BCT) query, which resembles the optimal
route query in that a k -BCT query consists of a set
of (ordered or un-ordered) spatial locations, and each
of it results should cover all locations in the query
set. However, unlike the optimal route query which
constructs routes on the fly, k-BCT retrieves k existing
trajectories from a database with the lowest aggregate
distance to the query points. The focus of [6] is clearly
different from our work, and its methods do not apply
to the optimal route query.
2.2 Spatial Search with Categorical Information
Besides the optimal route query, categorical information
has been used to identify locations with good surround-
ing facilities. Yiu et al. [15] study the spatial prefer-
ence query, which contains a list of desired categories.
Data points are then ranked by their total distances
to nearest points of these categories and those with
top-k best scores are returned to the user. Martinenghi
and Tagliasacchi [12] introduce the proximity rank join
operator, which searches for clusters of points that cover
all categories specified by the user and are close to a
given point and to each other.
Another class of related work concerns spatial key-
word search in collections of documents, which are
associated to spatial locations (e.g., derived from the
content of the document [1]). The query contains both
a spatial component (e.g., nearest neighbor search) and
a set of keywords. A keyword set is similar to a category
in that they are both non-spatial properties that can
be used to select a set of points (i.e., document loca-
tions). However, the number of different keyword sets
is significantly larger than the number of categories and,
thus, the former require specialized data structures (e.g.,
inverted lists) and search techniques (e.g., inverted list
intersection) to select relevant points. To accelerate spa-
tial keyword search, a common approach is to combine a
spatial index, e.g., R-tree with inverted lists or signature
techniques, to form a composite index [8], [16], [7], [5].
The relevance of a document to a query is calculated
by combining textual relevance with spatial distance; the
top-k objects with the highest overall scores are returned
to the user [7]. Besides simple similarity retrieval, the
mCK query [17] identifies clusters of points with mini-
mum diameters that match all query keywords. The top-
k prestige query [3] retrieves points based on prestige
scores, which originate from matching keywords and
flows to nearby points. Finally, the continuous top-k
spatial keyword query [14] returns a validity region to
the user; as long as the query point stays in the validity
region, the query results remain the same.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 4
3 BACKWARD SEARCH SOLUTIONS
In this section, we present the first methodology for
answering optimal route queries. Similar to R-LORD
[13], the backward search methodology computes the
optimal routes in reverse order of its points. Before
explaining the methods that fit this framework in detail,
we first present an important property of the general
sub-route query, as follows.
Lemma 1 (Suffix Optimality). Given a query q, G
Q
and
its optimal solution r
, let r r
be any suffix of r
, p be the
start point of r, and V be the set of categories covered by r.
Meanwhile, let G G
Q
be the sub-graph of G
Q
that contains
the set of categories V and all edges between these categories
in G
Q
. Then, r is the optimal solution for query p, G.
Proof (By contradiction): Suppose that there is a
better solution r
than r for the query p, G, i.e.,
length(r
)<length(r). Since r and r
have the same
starting point p, we can replace the suffix r with
r
in r
, and obtain a new route r
′∗
such that
length(r
′∗
)=length(r
)length(r)+length(r
) < length(r
).
Meanwhile, since r
is a valid solution to the query
p, G, r
covers the same category set V as r, and
satisfies the visit orders in G G
Q
. Because G contains
all visit orders about V , replacing suffix r with r
in r
does not violate G
Q
. Hence, r
′∗
also satisfies G
Q
. This
means that r
′∗
is a better solution to query q, G
Q
than
r
, which contradicts with the optimality of the r
.
Consider again the example in Figure 1, where the
optimal solution for the query q, G
Q
is r
= q
p
1
p
5
p
3
. The length-2 suffix of the optimal route
is r
2
= p
5
p
3
, which starts at point p
5
, and covers
two categories V
2
= {restaurant, pub}. Clearly, r is the
shortest route that starts at p
5
and covers V
2
, since p
3
is
the nearest pub with respect to p
5
. Likewise, the length-3
suffix r
3
= p
1
p
5
p
3
of r
is the optimal route that
(i) starts at p
1
, (ii) covers category set V
3
= {museum,
restaurant, pub}, and (iii) satisfies the constraint that a
restaurant must be visited before a pub. In general, all
suffixes of the query result are also optimal routes for
their respective starting point and categories visited and
the idea of backward search is to enumerate all possible
such suffixes. The suffix-optimality result in [13] is a
special case of Lemma 1, with the limitation that a total
order exists for all categories in G
Q
.
Based on Lemma 1, we develop two algorithms SBS
and BBS, presented in Sections 3.1 and 3.2 respectively.
SBS directly extends R-LORD to the general optimal
route problem, while BBS improves the performance
of SBS through batch processing. Table 2 summarizes
frequently used notations throughout the paper.
3.1 Simple Backward Search
Algorithm 1 illustrates the simple backward search (SBS)
method. Initially, SBS computes an upper bound θ of the
optimal route length, using a greedy algorithm (lines
1-2), e.g., the one described in Section 1. Then, SBS
TABLE 2
List of common symbols
Symbol Meaning
DS, N Dataset and its cardinality
q, G
Q
Query start point and visit order graph
m Total number of categories in G
Q
dist(p
1
, p
2
) Euclidean distance between points p
1
and p
2
mindist(M
1
, M
2
) Minimum distance between MBRs M
1
, M
2
length(r) Length of route r
minlen(R) Minimum length among the set R of routes
p r (r p) A route that first visits point p (follows sub-route r)
and then follows sub-route r (visits point p)
θ Length of a known route that satisfies the query
CS Set of points that may appear in the optimal route
p,V
Shortest route that starts at point p and visits all
categories in set V
P,V
Set of shortest routes that start at a point p P and
visits all categories in set V
C
p
, C
P
Category of a point p and that of a set P of points
having the same category, respectively
retrieves the set CS of candidate points that may be
part of the optimal route (line 3), which are those that (i)
belong to any category contained in the visit order graph
G
Q
, (ii) fall within distance θ to the query start point
q. This can be performed efficiently, e.g., by executing
a circular range query on each R-tree that indexes a
category of points relevant to the query. In the example
of Figure 1, SBS obtains all points p
1
-p
6
. Note that this
is different from R-LORD [13], which only loads points
belonging to the last category of the total-ordered query
in the initial step, e.g., pubs p
3
, p
4
. In out setting, there is
neither a total order or the concept of the last category.
Algorithm 1 Simple backward search algorithm
SBS(q , G
Q
) // SBS stands for simple backward search
// Input: q, G
Q
: query start point and visit order graph respectively
// Output: the optimal route that satisfies the query
1: Use a greedy algorithm to obtain a route r
g
2: Initialize threshold θ to length(r
g
)
3: Retrieve the set CS of points within θ distance to q, whose categories appear
in G
Q
4: Initialize route set R
1
to empty
5: for each point p in CS that can be the last point according to G
Q
do
6: Add p to R
1
7: for i = 1 to m 1 do call R
i+1
=BSJoin(q, G
Q
, θ, R
i
, CS)
8: Select from R
m
sub-route r
that minimizes length(q r
)
9: Return q r
as the query result
After loading all candidate points, SBS continues to
compute the optimal route sets R
1
-R
m
(lines 4-7). In
particular, route set R
i
(1 i m) contains all possible
length-i suffixes of the query solution. According to
Lemma 1, these suffixes must be the optimal routes
for their respective start point and the set of categories
covered. Table 3 lists all routes contained in R
1
-R
m
in
our running example. Specifically, R
1
consists of 4 single-
point routes: museums p
1
, p
2
, and pubs p
3
, p
4
. Restau-
rants are not included in R
1
, since they must be visited
before a pub and, thus, cannot be valid length-1 suffixes
of the query solution. Route sets R
2
-R
m
are computed
through backward joins, to be explained soon. Continu-
ing the example, R
2
contains all optimal suffixes that
cover two categories. Again, a route covering {museum,
restaurant} cannot be a suffix of the query result, since
it would place a pub before a restaurant, violating G
Q
.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 5
Similarly, R
3
contains all suffixes containing all three
query categories. After obtaining R
m
(i.e., R
3
in the
example), SBS connects the query point q with the start
point of each route in R
m
, and selects the route with
shortest total length (lines 8-9) as the query answer. Here
we use the notation p r to denote a route that starts
at p and follows the sub-route r, e.g., when r = p
2
p
3
,
p
1
r = p
1
p
2
p
3
.
TABLE 3
Optimal suffix table used in SBS
R
i
Start point Categories Covered Optimal Suffix
1
p
1
{museum}
p
1
p
2
p
2
p
3
{pub}
p
3
p
4
p
4
2
p
1
{museum, pub}
p
1
p
3
p
2
p
2
p
4
p
3
p
3
p
1
p
4
p
4
p
2
p
5
{restaurant, pub}
p
5
p
3
p
6
p
6
p
4
3
p
1
{museum, restaurant, pub}
p
1
p
5
p
3
p
2
p
1
p
5
p
3
p
5
p
5
p
3
p
1
p
6
p
6
p
3
p
1
It remains to clarify the backward join module BSJoin,
shown in Algorithm 2 (also used in our other methods
described later). Besides the query parameters, the main
inputs are (i) a set of points P , which is the entire
candidate set CS in SBS and (ii) a set of routes R (R
i
in
SBS), each of which is optimal for the combination of its
start point and categories covered. The join results (R
i+1
in SBS) consist of routes of the form p r, i.e., start point
p followed by sub-route r, where p P and r R. Note
that in SBS, the computation of R
i+1
(1 i < m) only
involves R
i
and CS, meaning that after obtaining R
i+1
,
R
1
-R
i
can be safely discarded to conserve memory.
Algorithm 2 Algorithm for backward join
BSJoin(q , G
Q
, θ, R, P )
// Input: q, G
Q
: query start point and visit order graph respectively
// θ: length of a known route that satisfies the query
// R, P : a set of routes and points respectively
// Output: backward join results of P and R
1: Initialize route set R
to empty
2: Partition R based on the set of categories they cover
3: for each point p P do
4: for each partition R
V
of R covering the same category set V do
5: if connecting p with a route in R
V
satisfies G
Q
then
6: Find the route r R
V
that minimizes the length of p r among
all routes in R
V
7: if dist(q, p)+length(p r) < θ then add p r to R
8: Return R
BSJoin selects join results based on three criteria. The
first concerns the visit order constraints G
Q
(line 5).
Specifically, the route p r itself must satisfy G
Q
and
not contain any duplicate categories. Meanwhile, since
p r is expected to be the suffix of a solution to the
query, G
Q
must allow all categories not covered by p r
be visited before p r. In the example of Figure 1,
BSJoin eliminates all join outputs that either has a pub
before a restaurant (which directly violate G
Q
), and those
that contains a restaurant, but not a pub (which cannot
be suffixes of legal routes). Second, according to Lemma
2, p r should be the optimal among all routes that
start at p and cover the same categories as p r (line 6).
Finally, p r must survive elliptic pruning [13] (line 7),
described in Section 2.1. Unlike R-LORD [13] which uses
the MBR of the ellipses to prune, BSJoin directly applies
elliptic pruning, which is more efficient according to our
experiments. The reason is that the MBR usually covers a
significantly larger area than the ellipses (e.g., in Figure
2(b)), leading to poor pruning effectiveness; moreover,
computing the MBR itself consumes considerable CPU
time, sometimes defeating the purpose of pruning. The
complexity of SBS is given by the following lemma.
Lemma 2. The SBS algorithm finds the optimal solution of
the query using O(N ·2
m
·m) memory, and O(N
2
·2
m
) time.
Proof: During the i
th
backward join (line 7 of Algo-
rithm 2), SBS maintains in memory the set R
i
of optimal
sub-routes of length i. Each sub-route has length i, and
there are at most N ·
m
i
routes in R
i
, where N is
total number of points in the dataset and
m
i
is the
number of category combinations of length i, from a
total of m categories. Because SBS also needs to compute
R
i+1
, the total memory consumption at this step is
O(N ·
m
i
· i + N ·
m
i+1
· (i + 1)). At the (i + 1)
th
backward join, SBS releases the memory occupied by
R
i
, since it no longer affects subsequent optimal sub-
route computations. Therefore, the peak memory usage
of SBS is O
max
m1
i=1
(N ·
m
i
· i + N ·
m
i+1
· (i + 1))
=
O(2
m
·N ·m). Backward joins dominate the runtime cost.
In particular, at step i, 1 i m, SBS joins O(N ·
m
i
)
sub-routes in R
i
with O(N) points in the candidate set
CS. The time taken at this step is O(N
2
·
m
i
). Summing
up all m steps, we obtain the time complexity of SBS:
O
m
i=1
N
2
·
m
i

= O(N
2
· 2
m
)
SBS is easy to implement and it achieves the same
worst-case time complexity as more complex algorithms
described later. The main drawback of SBS is that its
effectiveness relies heavily on the bound θ provided by
the greedy algorithm. When θ is loose (i.e., it is much
longer than the optimal length), SBS retrieves a large
number of candidate points, and joins them all with
the current sub-route set at every step. Moreover, the
backward join in SBS is performed in a nested-loop
fashion, which applies elliptic pruning on individual
results. Consequently, SBS can be rather inefficient for
large datasets with a skewed distribution.
3.2 Batch Backward Search
The batch backward search (BBS) method, shown in Al-
gorithm 3, improves SBS by employing batch processing
in the backward join operations. Specifically, both the
candidate set CS and the route set R
i
(1 i m)
are partitioned into clusters before participating in a
backward join (lines 2 and 4). The partitioning of CS first
groups points by their category, and then for each group,
the points are further partitioned into clusters based on
their spatial proximity. The partitioning of route set R
i

Citations
More filters
Journal ArticleDOI

Spatial keyword search: a survey

TL;DR: This survey summarizes the findings of existing spatial keyword search studies, thus uncovering new insights that may guide software engineers as well as further research.
Journal ArticleDOI

Efficient Clue-Based Route Search on Road Networks

TL;DR: This paper investigates the problem of clue-based route search and proposes a greedy algorithm and a dynamic programming algorithm as baselines and an AB-tree that stores both the distance and keyword information in tree structure, and a branch-and-bound algorithm that prunes unnecessary vertices in query processing.
Proceedings Article

Optimal route search with the coverage of users' preferences

TL;DR: This paper takes into account the weighted user preferences in route search, and presents a keyword coverage problem, which finds an optimal route from a source location to a target location such that the keyword coverage is optimized and that the budget score satisfies a specified constraint.
Proceedings ArticleDOI

Finding Top-k Optimal Sequenced Routes

TL;DR: The high extensibility of the proposed algorithms are demonstrated by incorporating Hop Labeling, an effective label indexing technique for shortest path queries, to further improve efficiency and significantly outperform the baseline method for the optimal sequenced route queries.
Proceedings ArticleDOI

Finding the minimum spatial keyword cover

TL;DR: This paper introduces a novel spatial keyword cover problem (SK-COVER), which aims to identify the group of spatio-textual objects covering all keywords in a query and minimizing a distance cost function that leads to fewer proximate objects in the answer set.
References
More filters
Proceedings ArticleDOI

R-trees: a dynamic index structure for spatial searching

TL;DR: A dynamic index structure called an R-tree is described which meets this need, and algorithms for searching and updating it are given and it is concluded that it is useful for current database systems in spatial applications.
Journal ArticleDOI

Efficient retrieval of the top-k most relevant spatial web objects

TL;DR: A new indexing framework for location-aware top-k text retrieval that encompasses algorithms that utilize the proposed indexes for computing the top- k query, thus taking into account both text relevancy and location proximity to prune the search space.
Proceedings ArticleDOI

Keyword Search on Spatial Databases

TL;DR: This work presents an efficient method to answer top-k spatial keyword queries using an indexing structure called IR2-Tree (Information Retrieval R-Tree) which combines an R- Tree with superimposed text signatures.
Proceedings ArticleDOI

Efficient query processing in geographic web search engines

TL;DR: This paper proposes several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.
Book ChapterDOI

On trip planning queries in spatial databases

TL;DR: This paper provides a number of approximation algorithms with approximation ratios that depend on either the number of categories, the maximum number of points per category or both, and gives an experimental evaluation of the proposed algorithms using both synthetic and real datasets.
Frequently Asked Questions (15)
Q1. What are the contributions in "Optimal route queries with arbitrary order constraints" ?

As the authors show in this paper, this naı̈ve approach incurs a significant amount of repeated computations, and, thus, is not scalable to large datasets. Motivated by this, the authors propose novel solutions to the general optimal route query, based on two different methodologies, namely backward search and forward search. In addition, the authors discuss how the proposed methods can be adapted to answer a variant of the optimal route queries, in which the route only needs to cover a subset of the given categories. 

In the future, the authors plan to study alternative definitions of the optimal route query, that have temporal constraints ( e. g., have lunch at a specified period ) or maximize the number of categories to be visited given a total travel length budget. 

Since backward search methods prune based solely on the greedy bound, they tend to maintain and process a large number of useless sub-routes. 

The main reason is that SBS prunes with only the bound θ obtained by the greedy algorithm, whereas PLUB tightens the the bound whenever a total-ordered subquery returns a better result. 

During the computation of the optimal suffix table, RLORD uses a pruning technique to eliminate sub-routesthat cannot participate in the optimal solution. 

the greedy algorithm is more likely to identify a good route whose length is close to the optimal one, meaning that the candidate set CS becomes smaller, leading to decreased join costs. 

Extensive experiments, using large-scale real and synthetic datasets, confirm that the proposed methods are efficient and practical. 

PLUB [11] decomposes a general optimal route query to multiple total-order queries and processes them individually, e.g., using R-LORD. 

As m grows, the accuracy of the greedy method worsens, and, consequently, the performance gap between SBS and PLUB gradually closes. 

The batch backward search (BBS) method, shown in Algorithm 3, improves SBS by employing batch processing in the backward join operations. 

In fact, the optimal route query is proven to be NP-hard [13], and heuristics-based algorithms such as Greedy cannot guarantee optimality of the result. 

A simple idea for pruning is to backtrack whenever the length of the current prefix reaches or exceeds the upper bound θ, since subsequent searches based on this prefix cannot possibly lead to the optimal solution to the query. 

Such sub-routes are pruned in SBS and BBS for the original optimal route query, since they can only lead to complete routes (i.e., those covering all 3 categories) that visit a pub before a restaurant, violating GQ. 

The order of new clusters to be added to the current prefix is based on their MBRs’ minimum distances to the MBR of the current last cluster P (line 7). 

Section 3 and 4 present solutions for the optimal route query, following the backward search and forward search frameworks, respectively.