What are the future works mentioned in the paper "Optimal route queries with arbitrary order constraints" ?

In the future, the authors plan to study alternative definitions of the optimal route query, that have temporal constraints ( e. g., have lunch at a specified period ) or maximize the number of categories to be visited given a total travel length budget.

What is the main reason why backward search methods prune?

Since backward search methods prune based solely on the greedy bound, they tend to maintain and process a large number of useless sub-routes.

What is the main reason for SBS to prune?

The main reason is that SBS prunes with only the bound θ obtained by the greedy algorithm, whereas PLUB tightens the the bound whenever a total-ordered subquery returns a better result.

What is the effect of the greedy algorithm on the performance of the optimal route query?

the greedy algorithm is more likely to identify a good route whose length is close to the optimal one, meaning that the candidate set CS becomes smaller, leading to decreased join costs.

How many experiments have confirmed the proposed methods are efficient and practical?

Extensive experiments, using large-scale real and synthetic datasets, confirm that the proposed methods are efficient and practical.

What is the main reason for the performance gap between SBS and PLUB?

As m grows, the accuracy of the greedy method worsens, and, consequently, the performance gap between SBS and PLUB gradually closes.

What is the way to prune a prefix?

A simple idea for pruning is to backtrack whenever the length of the current prefix reaches or exceeds the upper bound θ, since subsequent searches based on this prefix cannot possibly lead to the optimal solution to the query.

What is the way to prune sub-routes?

Such sub-routes are pruned in SBS and BBS for the original optimal route query, since they can only lead to complete routes (i.e., those covering all 3 categories) that visit a pub before a restaurant, violating GQ.

What is the order of the new clusters to be added to the current prefix?

The order of new clusters to be added to the current prefix is based on their MBRs’ minimum distances to the MBR of the current last cluster P (line 7).

(Open Access) Optimal Route Queries with Arbitrary Order Constraints (2013) | Jing Li

Q: What are the contributions in "Optimal route queries with arbitrary order constraints" ?

As the authors show in this paper, this naı̈ve approach incurs a significant amount of repeated computations, and, thus, is not scalable to large datasets. Motivated by this, the authors propose novel solutions to the general optimal route query, based on two different methodologies, namely backward search and forward search. In addition, the authors discuss how the proposed methods can be adapted to answer a variant of the optimal route queries, in which the route only needs to cover a subset of the given categories.

Q: What is the way to solve the optimal suffix table?

During the computation of the optimal suffix table, RLORD uses a pruning technique to eliminate sub-routesthat cannot participate in the optimal solution.

Q: What is the main drawback of the backward join method?

The batch backward search (BBS) method, shown in Algorithm 3, improves SBS by employing batch processing in the backward join operations.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 1

Optimal Route Queries with Arbitrary Order

Constraints

Jing Li, Yin Yang, Nikos Mamoulis

Abstract—Given a set of spatial points DS, each of which is associated with categorical information, e.g., restaurant, pub, etc., the

optimal route query ﬁnds the shortest path that starts from the query point (e.g., a home or hotel), and covers a user-speciﬁed set of

categories (e.g., {pub, restaurant, museum}). The user may also specify partial order constraints between different categories, e.g.,

a restaurant must be visited before a pub. Previous work has focused on a special case where the query contains the total order of

all categor ies to be visited (e.g., museum → restaurant → pub). For the general scenario without such a total order, the only known

solution reduces the problem to multiple, total-order optimal route queries. As we show in this paper, this na

ıve approach incurs a

signiﬁcant amount of repeated computations, and, thus, is not scalable to large datasets. Motivated by this, we propose novel solutions

to the general optimal route query, based on two different methodologies, namely backward search and forward search. In addition,

we discuss how the proposed methods can be adapted to answer a variant of the optimal route queries, in which the route only needs

to cover a subset of the given categories. Extensive experiments, using both real and synthetic datasets, conﬁrm that the proposed

solutions are efﬁcient and practical, and outperform existing methods by large margins.

Index Terms—H.2.4.h Query processing, H.2.4.k Spatial databases

1 INTRODUCTION

Consider a tourist who will have a free day to travel

around Hong Kong. Without much knowledge about the

city, s/he searches online maps to plan for a trip. Usually,

s/he has a ﬁxed starting point, e.g., her/his hotel, and

certain objectives in mind, such as visiting a museum,

dining at a ﬁne restaurant, and enjoying a few drinks at

a local pub. Meanwhile, some destinations may need to

be visited in a certain order. For instance, the trip should

have a pub after a restaurant. The ideal route should

cover all the destinations, satisfy all order constraints,

and minimize the total travel length. Searching for such

a route is captured by the optimal route query [4],

[10], [13], which usually has a vast search space, and,

consequently, is too tedious to be done manually. Cur-

rently, major online map providers have already shown

interest in tools that assist such trip planning tasks.

For example, Google City Tours (citytours.googlelabs.com)

provides suggested tours for a given starting address.

However, these tours are pre-deﬁned, and cannot be

customized according to the user’s plans. Yahoo Travel

(travel.yahoo.com) has a similar service that allows users

to search and share trips, which, unfortunately, cannot

answer optimal route queries either.

Figure 1 illustrates an example optimal route query

• J. Li is with the Department of Computer Science, University of Hong

Kong, Pokfulam Road, Hong Kong.

E-mail: jli@cs.hku.hk

• Y. Yang is with Advanced Digital Sciences Center, Singapore.

E-mail: yin.yang@adsc.com.sg

• N. Mamoulis is with the Department of Computer Science, University of

Hong Kong, Pokfulam Road, Hong Kong.

E-mail: nikos@cs.hku.hk

on a dataset DS with 6 locations p

-p

. Each location

is associated with one category C

, e.g., p

, p

are mu-

seums; p

, p

are pubs; and p

, p

are restaurants. (If a

location belongs to multiple categories, e.g., a restaurant

and pub, we conceptually split it into multiple points

with identical coordinates, each associated to a single

category.) The query contains two parameters: a starting

point q, and a directed acyclic graph G

called the visit

order graph. Each vertex in G

corresponds to a category

and each edge ⟨C, C

′

⟩ indicates that a point of category

C must be visited before another of category C

′

. In our

example, G

signiﬁes that a restaurant must be visited

before a pub. We follow a common assumption that each

category appears at most once in G

[4], [10], [13]. In

addition, to represent the fact that q must be the ﬁrst

point in the route, we create an artiﬁcial category C

containing a single point q, and add an edge connecting

and every other vertex in G

without an in-edge. The

result of the query is the shortest route that visits all cate-

gories in G

, while satisfying the visit order constraints.

In our example, such a route is q → p

→ p

. In

practice, the user may not have sufﬁcient time to visit all

the categories. In this situation, a reasonable compromise

is to ﬁnd a route that covers a subset of l categories from

, where l is a user-speciﬁed parameter. We call this

variant the size-l optimal route query.

A Greedy algorithm[13] to answer the optimal route

query ﬁrst ﬁnds the nearest neighbor of q that is allowed

to be visited right after q according to G

. In the running

example, Greedy chooses point p

(note that p

cannot

be selected, since G

requires that a pub is visited after

a restaurant). Then, Greedy adds p

to the current route,

and continues to compute the nearest allowable point

according to G

to be added to the route, which is p

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 2

museums

restaurants

pubs

query start point

greedy route

optimal route

={q}

museum

restaurant

pub

visit order graph G

Fig. 1. Example of optimal route query

After that, Greedy ﬁnds the nearest allowable point after

, i.e., p

. Since all categories in G

are visited, Greedy

returns the route q → p

→ p

. Observe that this

is longer than the optimal route q → p

→ p

. The

reason is that although p

is closer to q than p

, the latter

leads to a shorter sub-route that covers the remaining

categories. In fact, the optimal route query is proven to

be NP-hard [13], and heuristics-based algorithms such

as Greedy cannot guarantee optimality of the result.

Previous work on the optimal route query, e.g., [13],

has mainly focused on a special case where G

de-

ﬁnes a total order of categories to be visited. A na

ıve

approach for the general case, where G

is a partial

order, is to enumerate all total orders in G

and process

each of them individually. As we explain in Section

2.1, this method is inefﬁcient as it incurs considerable

repeated work. Motivated by this, we propose sev-

eral efﬁcient solutions to the general-case optimal route

query. Speciﬁcally, we investigate two methodologies:

backward search and forward search. The former com-

putes the optimal route from the last point to the ﬁrst,

while the latter follows the ﬁrst-to-last order of points.

Furthermore, all proposed solutions extend naturally to

size-l optimal route processing. Extensive experiments,

using large-scale real and synthetic datasets, conﬁrm that

the proposed methods are efﬁcient and practical.

The rest of this paper is organized as follows. Section

2 surveys related work. Section 3 and 4 present solutions

for the optimal route query, following the backward

search and forward search frameworks, respectively.

Section 5 extends the proposed solutions to the size-

l optimal query. Section 6 contains an extensive set of

experiments. Finally, Section 7 concludes the paper.

2 RELATED WORK

Section 2.1 reviews existing solutions to the optimal

route query. Section 2.2 surveys other related queries that

operate on spatial data with categorical information.

2.1 Optimal Route Query Processing

Early work on optimal route computation focuses on

greedy solutions. Chen et al. [4] use the same query

deﬁnition as this paper, and propose two heuristics. The

ﬁrst, namely NNPSR, resembles the greedy approach

described in Section 1; the second retrieves the nearest

point of the query start position q in every category,

and then connects them to form a route. In addition, [4]

also describe a simple combination of NNPSR and R-

LORD [13], which answers a special case of the optimal

route query with a total order of the categories to be

visited. The hybrid solution ﬁrst runs NNPSR to ﬁnd

a greedy route; then, it extract the category of each

point on the greedy route, and runs R-LORD with this

category sequence as input. None of the solutions in

[13] guarantees the quality of the results; these meth-

ods usually return sub-optimal routes according to the

experiments in [4]. Li et al. [10] study a variant of the

optimal route query that speciﬁes both a start point

start

and an end position q

end

, but no order constraint

between the data categories. This is equivalent to a visit

order graph G

that contains two artiﬁcial categories

start

= {q

start

} and C

end

= {q

end

}, and two edges

⟨C

start

, C⟩ and ⟨C, C

end

⟩ for each category C in the

dataset. The solutions of [10] report approximate query

results; on the other hand, this paper focuses on efﬁcient,

exact methods for the general optimal route problem.

Sharifzadeh et al. [13] propose R-LORD, the ﬁrst exact

solution for optimal route queries with a total order. In

the example of Figure 1, suppose that G

speciﬁes total

order q → museum → restaurant → pub; then, R-LORD

is directly applicable. Speciﬁcally, let r

∗

be the optimal

route; an important observation made in [13] is that any

sufﬁx r of r

∗

is also the shortest among all routes that

(i) start at the ﬁrst point of r, and (ii) visit the same

categories as r, in the same order. In our example, the

best answer to the query is r

∗

= q → p

→ p

. Its

length-2 sufﬁx p

→ p

is the shortest route that starts at

and visits a restaurant followed by a pub. Similarly,

its length-3 sufﬁx p

→ p

is the shortest path

that originates at p

and follows the category sequence

museum → restaurant → pub. This fact enables dynamic

programming, which gradually ﬁlls an optimal sufﬁx table.

In particular, R-LORD ﬁrst uses a greedy algorithm to

compute a route that satisﬁes the query, as well as its

length θ. Then, the method computes length-1 optimal

sufﬁxes, which are points from the last category in the

visit order that are within θ-distance to the query start

position q. In our example, R-LORD obtains pubs p

and

, and stores them in the optimal sufﬁx table shown in

Table 1. Next, R-LORD retrieves points from the second-

to-last category that are no farther than θ from q, i.e.,

restaurants p

and p

, and prepends them to the opti-

mal length-1 sufﬁxes to form optimal length-2 sufﬁxes

→ p

and p

→ p

. Note that p

→ p

and p

→ p

are discarded, as they have the same starting points

and category sequences as their shorter counterparts

→ p

and p

→ p

, respectively. In the third step, R-

LORD retrieves museums p

, p

, combines them with the

optimal length-2 sufﬁxes, and obtains optimal length-3

sufﬁxes p

→ p

and p

→ p

. Finally, R-

LORD connects them with q, and selects the shortest one

q → p

→ p

as the answer to the query.

During the computation of the optimal sufﬁx table, R-

LORD uses a pruning technique to eliminate sub-routes

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 3

TABLE 1

Optimal sufﬁx table used in R-LORD

Sufﬁx Length Start Point Optimal Sufﬁx

→ p

that cannot participate in the optimal solution. Figure 2

illustrates this technique, which we call elliptic pruning.

Suppose that at step i, R-LORD has computed an optimal

sub-route r of length i. Let p

be the ﬁrst point of r,

length(r) be the total length of r, and θ be the length of

the greedy route. Then, at step i + 1, R-LORD connects

r only to points whose total distance to q and p

is no

larger than θ − length(r). Thus, the range for points al-

lowed to connect to r is an ellipse with foci q and p

and

major diameter length(r). For example, in Figure 2(a),

point p

is not connected to sub-route r, as the former

falls outside the latter’s ellipse. This is true even when

the combination of p

and r leads to an optimal sub-

route of length i + 1. Thus, elliptic pruning reduces the

number of stored optimal sub-routes and, thus, improves

both memory consumption and CPU time. Furthermore,

to minimize I/O costs, R-LORD computes the minimum

bounding rectangle (MBR) of all ellipses generated from

length-i optimal sub-routes, as shown in Figure 2(b), and

uses this MBR as a range query to retrieve points from

the R-tree [9] that indexes the category to be examined

during the (i + 1)

step.

Sub-route of length i

Cannot connect to r at step i+1

(a) with one ellipse

MBR

(b) with multiple ellipses

Fig. 2. Elliptic pruning in R-LORD

PLUB [11] decomposes a general optimal route query

to multiple total-order queries and processes them in-

dividually, e.g., using R-LORD. For instance, the query

in Figure 1 is decomposed into three total-order queries:

museum → restaurant → pub, restaurant → museum

→ pub, and restaurant → pub → museum. This incurs

signiﬁcant amounts of repeated computations for longer

sequences. For example, assume that in the query of Fig-

ure 1 there is an additional category (e.g., mall) that does

not have any order constraints with other categories.

The decomposition of this new query involves multiple

total orders that share a common sufﬁx, such as mall →

museum → restaurant → pub and museum → mall →

restaurant → pub. Consequently, the processing of both

orders involves the computation of optimal sub-routes

that start at a restaurant and are followed by a pub. This

problem is ampliﬁed, as the number of categories in G

increases, since the number of total orders that share a

common sufﬁx increases exponentially.

Finally, Chen et al. [6] study the k Best Connected

Trajectories (k-BCT) query, which resembles the optimal

route query in that a k -BCT query consists of a set

of (ordered or un-ordered) spatial locations, and each

of it results should cover all locations in the query

set. However, unlike the optimal route query which

constructs routes on the ﬂy, k-BCT retrieves k existing

trajectories from a database with the lowest aggregate

distance to the query points. The focus of [6] is clearly

different from our work, and its methods do not apply

to the optimal route query.

2.2 Spatial Search with Categorical Information

Besides the optimal route query, categorical information

has been used to identify locations with good surround-

ing facilities. Yiu et al. [15] study the spatial prefer-

ence query, which contains a list of desired categories.

Data points are then ranked by their total distances

to nearest points of these categories and those with

top-k best scores are returned to the user. Martinenghi

and Tagliasacchi [12] introduce the proximity rank join

operator, which searches for clusters of points that cover

all categories speciﬁed by the user and are close to a

given point and to each other.

Another class of related work concerns spatial key-

word search in collections of documents, which are

associated to spatial locations (e.g., derived from the

content of the document [1]). The query contains both

a spatial component (e.g., nearest neighbor search) and

a set of keywords. A keyword set is similar to a category

in that they are both non-spatial properties that can

be used to select a set of points (i.e., document loca-

tions). However, the number of different keyword sets

is signiﬁcantly larger than the number of categories and,

thus, the former require specialized data structures (e.g.,

inverted lists) and search techniques (e.g., inverted list

intersection) to select relevant points. To accelerate spa-

tial keyword search, a common approach is to combine a

spatial index, e.g., R-tree with inverted lists or signature

techniques, to form a composite index [8], [16], [7], [5].

The relevance of a document to a query is calculated

by combining textual relevance with spatial distance; the

top-k objects with the highest overall scores are returned

to the user [7]. Besides simple similarity retrieval, the

mCK query [17] identiﬁes clusters of points with mini-

mum diameters that match all query keywords. The top-

k prestige query [3] retrieves points based on prestige

scores, which originate from matching keywords and

ﬂows to nearby points. Finally, the continuous top-k

spatial keyword query [14] returns a validity region to

the user; as long as the query point stays in the validity

region, the query results remain the same.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 4

3 BACKWARD SEARCH SOLUTIONS

In this section, we present the ﬁrst methodology for

answering optimal route queries. Similar to R-LORD

[13], the backward search methodology computes the

optimal routes in reverse order of its points. Before

explaining the methods that ﬁt this framework in detail,

we ﬁrst present an important property of the general

sub-route query, as follows.

Lemma 1 (Sufﬁx Optimality). Given a query ⟨q, G

⟩ and

its optimal solution r

∗

, let r ⊆ r

∗

be any sufﬁx of r

∗

, p be the

start point of r, and V be the set of categories covered by r.

Meanwhile, let G ⊆ G

be the sub-graph of G

that contains

the set of categories V and all edges between these categories

in G

. Then, r is the optimal solution for query ⟨p, G⟩.

Proof (By contradiction): Suppose that there is a

better solution r

′

than r for the query ⟨p, G⟩, i.e.,

length(r

′

)<length(r). Since r and r

′

have the same

starting point p, we can replace the sufﬁx r with

′

in r

∗

, and obtain a new route r

′∗

such that

length(r

′∗

)=length(r

∗

)−length(r)+length(r

′

) < length(r

∗

Meanwhile, since r

′

is a valid solution to the query

⟨p, G⟩, r

′

covers the same category set V as r, and

satisﬁes the visit orders in G ⊂ G

. Because G contains

all visit orders about V , replacing sufﬁx r with r

′

in r

∗

does not violate G

. Hence, r

′∗

also satisﬁes G

. This

means that r

′∗

is a better solution to query ⟨q, G

⟩ than

∗

, which contradicts with the optimality of the r

∗

Consider again the example in Figure 1, where the

optimal solution for the query ⟨q, G

⟩ is r

∗

= q →

→ p

. The length-2 sufﬁx of the optimal route

is r

= p

→ p

, which starts at point p

, and covers

two categories V

= {restaurant, pub}. Clearly, r is the

shortest route that starts at p

and covers V

, since p

the nearest pub with respect to p

. Likewise, the length-3

sufﬁx r

= p

→ p

of r

∗

is the optimal route that

(i) starts at p

, (ii) covers category set V

= {museum,

restaurant, pub}, and (iii) satisﬁes the constraint that a

restaurant must be visited before a pub. In general, all

sufﬁxes of the query result are also optimal routes for

their respective starting point and categories visited and

the idea of backward search is to enumerate all possible

such sufﬁxes. The sufﬁx-optimality result in [13] is a

special case of Lemma 1, with the limitation that a total

order exists for all categories in G

Based on Lemma 1, we develop two algorithms SBS

and BBS, presented in Sections 3.1 and 3.2 respectively.

SBS directly extends R-LORD to the general optimal

route problem, while BBS improves the performance

of SBS through batch processing. Table 2 summarizes

frequently used notations throughout the paper.

3.1 Simple Backward Search

Algorithm 1 illustrates the simple backward search (SBS)

method. Initially, SBS computes an upper bound θ of the

optimal route length, using a greedy algorithm (lines

1-2), e.g., the one described in Section 1. Then, SBS

TABLE 2

List of common symbols

Symbol Meaning

DS, N Dataset and its cardinality

q, G

Query start point and visit order graph

m Total number of categories in G

dist(p

, p

) Euclidean distance between points p

and p

mindist(M

, M

) Minimum distance between MBRs M

, M

length(r) Length of route r

minlen(R) Minimum length among the set R of routes

p → r (r → p) A route that ﬁrst visits point p (follows sub-route r)

and then follows sub-route r (visits point p)

θ Length of a known route that satisﬁes the query

CS Set of points that may appear in the optimal route

Ω

p,V

Shortest route that starts at point p and visits all

categories in set V

Ω

P,V

Set of shortest routes that start at a point p ∈ P and

visits all categories in set V

, C

Category of a point p and that of a set P of points

having the same category, respectively

retrieves the set CS of candidate points that may be

part of the optimal route (line 3), which are those that (i)

belong to any category contained in the visit order graph

, (ii) fall within distance θ to the query start point

q. This can be performed efﬁciently, e.g., by executing

a circular range query on each R-tree that indexes a

category of points relevant to the query. In the example

of Figure 1, SBS obtains all points p

-p

. Note that this

is different from R-LORD [13], which only loads points

belonging to the last category of the total-ordered query

in the initial step, e.g., pubs p

, p

. In out setting, there is

neither a total order or the concept of the last category.

Algorithm 1 Simple backward search algorithm

SBS(q , G

) // SBS stands for simple backward search

// Input: q, G

: query start point and visit order graph respectively

// Output: the optimal route that satisﬁes the query

1: Use a greedy algorithm to obtain a route r

2: Initialize threshold θ to length(r

)

3: Retrieve the set CS of points within θ distance to q, whose categories appear

in G

4: Initialize route set R

to empty

5: for each point p in CS that can be the last point according to G

6: Add ⟨p⟩ to R

7: for i = 1 to m − 1 do call R

i+1

=BSJoin(q, G

, θ, R

, CS)

8: Select from R

sub-route r

∗

that minimizes length(q → r

∗

)

9: Return q → r

∗

as the query result

After loading all candidate points, SBS continues to

compute the optimal route sets R

-R

(lines 4-7). In

particular, route set R

(1 ≤ i ≤ m) contains all possible

length-i sufﬁxes of the query solution. According to

Lemma 1, these sufﬁxes must be the optimal routes

for their respective start point and the set of categories

covered. Table 3 lists all routes contained in R

-R

our running example. Speciﬁcally, R

consists of 4 single-

point routes: museums p

, p

, and pubs p

, p

. Restau-

rants are not included in R

, since they must be visited

before a pub and, thus, cannot be valid length-1 sufﬁxes

of the query solution. Route sets R

-R

are computed

through backward joins, to be explained soon. Continu-

ing the example, R

contains all optimal sufﬁxes that

cover two categories. Again, a route covering {museum,

restaurant} cannot be a sufﬁx of the query result, since

it would place a pub before a restaurant, violating G

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. ?, NO. ?, ? 20?? 5

Similarly, R

contains all sufﬁxes containing all three

query categories. After obtaining R

(i.e., R

in the

example), SBS connects the query point q with the start

point of each route in R

, and selects the route with

shortest total length (lines 8-9) as the query answer. Here

we use the notation p → r to denote a route that starts

at p and follows the sub-route r, e.g., when r = p

→ p

→ r = p

→ p

TABLE 3

Optimal sufﬁx table used in SBS

Start point Categories Covered Optimal Sufﬁx

{museum}

{pub}

{museum, pub}

→ p

{restaurant, pub}

→ p

{museum, restaurant, pub}

→ p

It remains to clarify the backward join module BSJoin,

shown in Algorithm 2 (also used in our other methods

described later). Besides the query parameters, the main

inputs are (i) a set of points P , which is the entire

candidate set CS in SBS and (ii) a set of routes R (R

SBS), each of which is optimal for the combination of its

start point and categories covered. The join results (R

i+1

in SBS) consist of routes of the form p → r, i.e., start point

p followed by sub-route r, where p ∈ P and r ∈ R. Note

that in SBS, the computation of R

i+1

(1 ≤ i < m) only

involves R

and CS, meaning that after obtaining R

i+1

-R

can be safely discarded to conserve memory.

Algorithm 2 Algorithm for backward join

BSJoin(q , G

, θ, R, P )

// Input: q, G

: query start point and visit order graph respectively

// θ: length of a known route that satisﬁes the query

// R, P : a set of routes and points respectively

// Output: backward join results of P and R

1: Initialize route set R

′

to empty

2: Partition R based on the set of categories they cover

3: for each point p ∈ P do

4: for each partition R

of R covering the same category set V do

5: if connecting p with a route in R

satisﬁes G

then

6: Find the route r ∈ R

that minimizes the length of p → r among

all routes in R

7: if dist(q, p)+length(p → r) < θ then add p → r to R

′

8: Return R

′

BSJoin selects join results based on three criteria. The

ﬁrst concerns the visit order constraints G

(line 5).

Speciﬁcally, the route p → r itself must satisfy G

and

not contain any duplicate categories. Meanwhile, since

p → r is expected to be the sufﬁx of a solution to the

query, G

must allow all categories not covered by p → r

be visited before p → r. In the example of Figure 1,

BSJoin eliminates all join outputs that either has a pub

before a restaurant (which directly violate G

), and those

that contains a restaurant, but not a pub (which cannot

be sufﬁxes of legal routes). Second, according to Lemma

2, p → r should be the optimal among all routes that

start at p and cover the same categories as p → r (line 6).

Finally, p → r must survive elliptic pruning [13] (line 7),

described in Section 2.1. Unlike R-LORD [13] which uses

the MBR of the ellipses to prune, BSJoin directly applies

elliptic pruning, which is more efﬁcient according to our

experiments. The reason is that the MBR usually covers a

signiﬁcantly larger area than the ellipses (e.g., in Figure

2(b)), leading to poor pruning effectiveness; moreover,

computing the MBR itself consumes considerable CPU

time, sometimes defeating the purpose of pruning. The

complexity of SBS is given by the following lemma.

Lemma 2. The SBS algorithm ﬁnds the optimal solution of

the query using O(N ·2

·m) memory, and O(N

·2

) time.

Proof: During the i

backward join (line 7 of Algo-

rithm 2), SBS maintains in memory the set R

of optimal

sub-routes of length i. Each sub-route has length i, and

there are at most N ·





routes in R

, where N is

total number of points in the dataset and





is the

number of category combinations of length i, from a

total of m categories. Because SBS also needs to compute

i+1

, the total memory consumption at this step is

O(N ·





· i + N ·



i+1



· (i + 1)). At the (i + 1)

backward join, SBS releases the memory occupied by

, since it no longer affects subsequent optimal sub-

route computations. Therefore, the peak memory usage

of SBS is O



max

m−1

i=1

(N ·





· i + N ·



i+1



· (i + 1))



O(2

·N ·m). Backward joins dominate the runtime cost.

In particular, at step i, 1 ≤ i ≤ m, SBS joins O(N ·





)

sub-routes in R

with O(N) points in the candidate set

CS. The time taken at this step is O(N





). Summing

up all m steps, we obtain the time complexity of SBS:





i=1





= O(N

· 2

)

SBS is easy to implement and it achieves the same

worst-case time complexity as more complex algorithms

described later. The main drawback of SBS is that its

effectiveness relies heavily on the bound θ provided by

the greedy algorithm. When θ is loose (i.e., it is much

longer than the optimal length), SBS retrieves a large

number of candidate points, and joins them all with

the current sub-route set at every step. Moreover, the

backward join in SBS is performed in a nested-loop

fashion, which applies elliptic pruning on individual

results. Consequently, SBS can be rather inefﬁcient for

large datasets with a skewed distribution.

3.2 Batch Backward Search

The batch backward search (BBS) method, shown in Al-

gorithm 3, improves SBS by employing batch processing

in the backward join operations. Speciﬁcally, both the

candidate set CS and the route set R

(1 ≤ i ≤ m)

are partitioned into clusters before participating in a

backward join (lines 2 and 4). The partitioning of CS ﬁrst

groups points by their category, and then for each group,

the points are further partitioned into clusters based on

their spatial proximity. The partitioning of route set R

Optimal Route Queries with Arbitrary Order Constraints

Figures

Citations

Spatial keyword search: a survey

Efficient Clue-Based Route Search on Road Networks

Optimal route search with the coverage of users' preferences

Finding Top-k Optimal Sequenced Routes

Finding the minimum spatial keyword cover

References

R-trees: a dynamic index structure for spatial searching

Efficient retrieval of the top-k most relevant spatial web objects

Keyword Search on Spatial Databases

Efficient query processing in geographic web search engines

On trip planning queries in spatial databases

Related Papers (5)

The optimal sequenced route query

On trip planning queries in spatial databases

The multi-rule partial sequenced route query

Keyword-aware optimal route search

Keyword Search on Spatial Databases

Frequently Asked Questions (15)

Q1. What are the contributions in "Optimal route queries with arbitrary order constraints" ?

Q2. What are the future works mentioned in the paper "Optimal route queries with arbitrary order constraints" ?

Q3. What is the main reason why backward search methods prune?

Q4. What is the main reason for SBS to prune?

Q5. What is the way to solve the optimal suffix table?

Q6. What is the effect of the greedy algorithm on the performance of the optimal route query?

Q7. How many experiments have confirmed the proposed methods are efficient and practical?

Q8. What is the way to decompose a general optimal route query?

Q9. What is the main reason for the performance gap between SBS and PLUB?

Q10. What is the main drawback of the backward join method?

Q11. How is the optimal route query proven?

Q12. What is the way to prune a prefix?

Q13. What is the way to prune sub-routes?

Q14. What is the order of the new clusters to be added to the current prefix?

Q15. What are the solutions for the optimal route query?