Proceedings Article•DOI•

Efficiently querying moving objects with pre-defined paths in a distributed environment

Cyrus Shahabi¹, Mohammad R. Kolahdouzan¹, Snehal Thakkar¹, José Luis Ambite¹, Craig A. Knoblock¹ - Show less +1 more•Institutions (1)

University of Southern California¹

09 Nov 2001-pp 34-40

TL;DR: A novel spatio-temporal filter is proposed, called deviation filter, that exploits both the spatial and temporal characteristics of the sources in order to improve the selectivity of the query response times.

read less

Abstract: Due to the recent growth of the World Wide Web, numerous spatio-temporal applications can obtain their required information from publicly available web sources. We consider those sources maintaining moving objects with predefined paths and schedules, and investigate different plans to perform queries on the integration of these data sources efficiently. Examples of such data sources are networks of railroad paths and schedules for trains running between cities connected through these networks. A typical query on such data sources is to find all trains that pass through a given point on the network within a given time interval. We show that traditional filter+semi-join plans would not result in efficient query response times on distributed spatio-temporal sources. Hence, we propose a novel spatio-temporal filter, called deviation filter, that exploits both the spatial and temporal characteristics of the sources in order to improve the selectivity. We also report on our experiments in comparing the performances of the alternative query plans and conclude that the plan with spatio-temporal filter is the most viable and superior plan.

...read moreread less

Summary (2 min read)

Jump to: [1. INTRODUCTION] – [2. PROBLEM DEFINITION] – [3. CENTRALIZED ENVIRONMENT] – [4. DISTRIBUTED ENVIRONMENT] – [4.2.2 Adaptive Threshold] – [5.1 Implementation] and [5.2 Experimental Results]

1. INTRODUCTION

The explosive growth of the Internet has made a wealth of networked information available.
One solution to reduce the query processing time of moving objects with predefined paths and schedules is to precompute the required information and materialize it using a moving object data model such as the 3D Trajectory [3] model.
Different sources may contain overlapping information or only fragments of desired data.
The authors try to overcome the first obstacle by either performing a pre-computation step and then apply a less expensive function/filter, or delaying the spatial filter until they reduce the size of the spatial data (e.g., railroad vector data) significantly.
Section 5 reports on their experimental observations.

2. PROBLEM DEFINITION

The director of the movie needs to make sure that no train passes by that location while they are shooting the movie.
This requires having information about the schedules of the trains running on the nearby railroads.
The following is the list of the simplified version of the sources available on the web and in their local databases, which can be used to provide necessary information for the director of the movie.
Hence, the function Ψ constructs all paths between any possible start and end points (complexity O(N2r )) while Φ does the same operation on all possible station pairs (complexity O(N2s )).
The sources Sstations and Srailroads contain spatial information while Sschedules has temporal content.

3. CENTRALIZED ENVIRONMENT

The authors integrate the sources Sstations, Sschedules and Srailroads off-line using the following relational algebra expression to generate a 3D-trajectory source.
One can answer the queries described in Section 2 through an intersect operation on 3D-trajectory source between each train trajectory and (x, y, [ta, tb]), where (x, y) is the query point and [ta, tb] is the given time interval.the authors.
The relational algebra expression for the intersect operation is given in Equation 2.
The cost of building the materialized view is usually very high as the sources usually contain large sets of data.
In addition, the limitations of the mediator server and/or regulations imposed by remote sources may restrict us from materializing the data locally.

4. DISTRIBUTED ENVIRONMENT

The authors describe four alternative approaches to integrate the sources Sstations, Sschedules and Srailroads in a distributed environment.
The coordinates of the stations are not exploited to improve the selectivity of the temporal selection predicate.
The authors propose alternative algorithms to determine the threshold for the path deviation approach.
The authors use Equation 7 to compute the threshold: PDT = max( length(railroad) Distance(PS , PE) ) (7) This equation computes the maximum ratio between the actual length of the railroad connecting all pairs of the starting and ending points in Srailroads and the euclidian distance between them.

4.2.2 Adaptive Threshold

The authors propose two approaches to adaptively select the value of the threshold.
Subsequently, different path deviation predicates with different thresholds are applied to different regions.
Unlike the ap- proach discussed in Section 4.1, the spatial predicate is performed before the join operation between Sspp and the result of the selection on Sschedules.
This has the advantage that the spatial expression σQ′ S can be applied earlier to Sspp.
The authors first perform a spatial selection on Sspp to find all railroad paths that intersect with the given point (x, y).

5.1 Implementation

The configuration for the experiments consisted of two SUN servers connected through a 100 mpbs LAN.
Both systems run Informix Universal Server 9.2 with ESRI Spatial datablade.
Random railroad segments connect each station with its 3 to 16 closest stations to provide different connectivity for stations.
Different values of start time for the given time interval, ta, were randomly selected.

5.2 Experimental Results

Figure 4 depicts the results of their experiments for the four alternative query plans discussed in Section 4.
The Y-axis shows the average query response time in seconds.
This approach assumes complete control over the remote server hosting Sschedules, because both Sspp and the results of the selection predicate over Sspp must be stored at the remote server.
This is because of the good selectivity of this filter only for short ranges of time interval.
Note that although both PTF and TF SJ do not utilize the expensive σQS (because of PTF ’s precomputation step and TF SJ ’s delayed spatial operation), the overhead of the temporal data transfer over the network is so high that cancels out all the low complexity benefits of σQ′ S .

Did you find this useful? Give us your feedback

Figures (5)

Figure 1: An example of railroad networks

Figure 2: Alternative query plans to integrate sources in a distributed environment.

Figure 3: Graphical representation of the path deviation approach

Table 1: Comparison of the Ψ and Φ functions.

Figure 4: Comparison between the four query plans

Content maybe subject to copyright Report

Efﬁciently Querying Moving Objects with Pre-deﬁned

Paths in a Distributed Environment

∗

Cyrus Shahabi, Mohammad R. Kolahdouzan, Snehal Thakkar,

Jose Luis Ambite

†

and Craig A. Knoblock

†

Department of Computer Science and

†

Information Sciences Institute

University of Southern California

Los Angeles, California 90089

[shahabi, kolahdoz, snehalth]@usc.edu [ambite, knoblock]@isi.edu

ABSTRACT

Due to the recent growth of the World Wide Web, numer-

ous spatio-temp oral applications can obtain their required

information from publicly available web sources. We con-

sider those sources maintaining moving objects with prede-

ﬁned paths and schedules, and investigate diﬀerent plans to

p erform queries on the integration of these data sources eﬃ-

ciently. Examples of such data sources are networks of rail-

road paths and schedules for trains running between cities

connected through these networks. A typical query on such

data sources is to ﬁnd all trains that pass through a given

p oint on the network within a given time interval. We show

that traditional ﬁlter+semi-join plans would not result in ef-

ﬁcient query response times on distributed spatio-temporal

sources. Hence, we propose a novel spatio-temp oral ﬁlter,

called deviation ﬁlter, that exploits both the spatial and tem-

p oral characteristics of the sources in order to improve the

selectivity. We also report on our experiments in comparing

the performances of the alternative query plans and con-

clude that the plan with spatio-temp oral ﬁlter is the most

viable and superior plan.

1. INTRODUCTION

The explosive growth of the Internet has made a wealth

of networked information available. Much of this informa-

tion is geographical, spatial, temporal, or pertains to ob-

jects that have a spatial or temporal nature. The sources

of this information are heterogeneous: traditional databases

∗

This research has been funded in part by NSF grants EEC-

9529152 (IMSC ERC) and ITR-0082826, NASA/JPL con-

tract nr. 961518, DARPA and USAF under agreement nr.

F30602-99-1-0524, and unrestricted cash/equipment gifts

from NCR, IBM, Intel and SUN.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

with spatial extensions, geographical information systems

(GIS) software packages, mapping and imagery web sites,

web sites with spatial information, etc. An increasing num-

ber of web sites have information of a geospatial or temporal

character. For example, detailed satellite images can be ob-

tained from sites such as www.terraserver.com; maps from

www.mapquest.com; train schedules from www.amtrak.com;

geolocated points of interest such as train stations from

www.usgs.gov; geographical features such as railroad net-

works from www.nima.mil ; etc. The number of sources,

the quality, and detail of the information available are con-

tinually growing all around the globe. In this paper, we

focus our attention on how to eﬃciently query moving ob-

jects such as trains in a distributed environment such as the

one mentioned above.

Recently there has b een a growing interest in moving object

databases that manage the spatial objects whose position

changes over time [2, 3, 4, 5, 6, 7, 8]. Example applications

are those who query the locations of trains, cars and planes

for a given time interval. The main challenge investigated

by these studies is how to model the large spatio-temporal

data needed to track the position of any object at any given

time (either in the past, future or now).

In this paper, we consider an environment where the content

of the moving object database do es not need to be modi-

ﬁed to reﬂect the movement of the objects. We term this

environment as “moving objects with predeﬁned paths and

schedules.” An example application is to query the loca-

tion of trains moving on a railroad network. By storing

the schedules of trains’ departures and arrivals, the loca-

tions of the stations and the vector data corresponding to

the railroad network, we have enough information to query

the location of any moving object (i.e., train) at any given

time. Note that the database still needs to be modiﬁed (e.g.,

when schedule changes), but it does not need to be updated

(and/or appended) as the objects move around the network

within the provided schedule. The challenge, however, is

that queries of the type of ﬁnding the location of a train

in a given time interval are time consuming because of the

expensive functions such as the shortest path function that

need to be performed on large vector data as well as the

temp oral intersections that need to be applied on large sets

of time intervals.

One solution to reduce the query processing time of mov-

ing objects with predeﬁned paths and schedules is to pre-

compute the required information and materialize it using

a moving object data model such as the 3D Trajectory [3]

mo del. This is a feasible approach if we assume that diﬀer-

ent schedules, railroads and stations information are all local

and over which we have full control. However, with our as-

sumed distributed environment, the sources of information

that we would like to access are autonomous and dynamic.

That is, we do not have administrative control over them,

cannot modify their structure, or write data to them. The

sources can change their information without warning. Dif-

ferent sources may contain overlapping information or only

fragments of desired data.

Therefore, we propose alternative distributed query plans to

realize the integration of spatial and temporal information

(e.g., network of the railroads and schedules of the trains)

from distributed, heterogeneous sources. We start by in-

vestigating traditional ﬁlter+semi-join plans by either ap-

plying the temporal ﬁlter ﬁrst and then perform the spa-

tial semi-join or vice versa. However, we show that there

are two main drawbacks with pure ﬁlter+semi-join plans.

First, spatial ﬁlters (e.g., identifying all railroad segments

that overlap with a given point) are computationally com-

plex resulting in long local or remote query processing time.

Second, temporal ﬁlters (identifying all intervals that over-

lap with a given interval) usually have bad selectivity due

to the large range of intervals covered by each instance in

the temporal source (e.g., schedule table). That is, many

schedules usually intersect with any query interval. Thus,

temp oral ﬁlters cannot eﬀectively reduce the amount of data

transferred over the network. We try to overcome the ﬁrst

obstacle by either performing a pre-computation step and

then apply a less expensive function/ﬁlter, or delaying the

spatial ﬁlter until we reduce the size of the spatial data (e.g.,

railroad vector data) signiﬁcantly. We address the second

obstacle by proposing a spatio-temporal ﬁlter (termed de-

viation ﬁlter), instead of temporal-only ﬁlters, which can

also exploit the spatial characteristics of the data to im-

prove the selectivity. Finally, we report on our experiments

in evaluating and comparing the p erformances of our diﬀer-

ent moving object query plans. Although one version of the

pre-computation approach outperforms all the other plans,

the pre-computation approach may not be feasible due to

limited control over remote sources. Therefore, we conclude

that our deviation based approach, which is only marginally

worse than the pre-computation approach, is the superior

plan.

The remainder of this paper is organized as follows. We

formally deﬁne the problem in Section 2. Sections 3 and 4

discuss the solutions to the problem for the centralized and

distributed environments, respectively. Section 5 reports on

our experimental observations. Finally, we discuss the con-

clusion and future work in Section 6.

2. PROBLEM DEFINITION

To better understand the problem, consider a scenario that

a movie is being produced at a certain location on a certain

date for several hours. The director of the movie needs to

make sure that no train passes by that location while they

are shooting the movie. This requires having information

about the schedules of the trains running on the nearby

railroads.

The following is the list of the simpliﬁed version of the

sources available on the web and in our local databases,

which can be used to provide necessary information for the

director of the movie.

• Name of the train stations and their geographical lo-

cations (from Silicon Mapping Solutions):

station

= Stations( Station-Name, Station-Point)

where station-name is a character string that contains

a unique name for each station and station-point rep-

resents the latitude and longitude of the station.

• Up-to-date schedule information of the trains (from

the train company web site):

schedule

= Schedules( Train-ID, Departing-Station-

Name, Departing-Time, Arriving-Station-Name, Arriving-

Time )

• The vector data containing the railroads’ path (from

Nima gazetter):

railroads

= Railroads( Railroad-ID, Railroad-Path,

Starting-Point, Ending-Point )

where the railroad-path is a 2D line with starting-point

and ending-point as its ﬁrst and last vertices.

As a real-world example, the above sources of information

for the United States contain around 1000 stations, 400,000

schedules, and 170,000 line segments for the railroad net-

work. By appropriate spatio-temporal integration of the

above sources, we are interested in processing the following

types of queries:

• Q1: Find the position of a train for a given time in-

stant.

• Q2: Given a geographical point and a time interval,

ﬁnd all the trains that pass through the point during

the given time interval.

For the reminder of this paper, to simplify the discussion,

we only focus on the more general query, i.e. Q2. We now

formally deﬁne the sources of information and predicates of

Q2 as:

• Name of the stations :

s = {(s

)|1 ≤ i ≤ N

} where N

is the total number of

stations.

• Locations of the stations:

stations

= {(s

, (x

, y

))|1 ≤ i ≤ N

} where (x

, y

)

sp ecify the geographical location of the station s

• Schedules of the trains:

schedules

= {V

= (T rain

, s

, [t

, t

])|1 ≤ i, j ≤

; 1 ≤ k ≤ M } where M is the cardinality of sched-

ules. Note that M À N

• Railroad network:

railroads

= {(< X

, Y

> ... < X

, Y

>)|1 ≤ i, j ≤

; (x

, y

)(X, Y )} where N

is the number of rail-

roads in the railroad network that is in the order of

, and (X, Y ) speciﬁes the geographical location of

the starting and ending points of each railroad seg-

ment. Note that not all the start and end points are

stations. However, all stations are either a start or an

end point or both. Hence N

À N





Figure 1: An example of railroad networks

• Q

= Intersect([t

, t

], [t

, t

])

, representing the temporal predicate of the query,

ﬁnds schedules that have intersection with a given time

interval [t

, t

• Q

= Intersect((x, y), Ψ(S

railroads

))

where Ψ(S

railroads

) is a function that calculates the

shortest

railroad paths between all possible start and

end point pairs (< X

, Y

>, < X

, Y

>).

• Q

= Intersect((x, y), Φ(S

railroads

, S

stations

))

where Φ(S

railroads

, S

stations

) is a function that calcu-

lates the shortest railroad paths between all station

pairs (< x

, y

>, < x

, y

>). Since the set of the sta-

tion coordinates, (x, y), is a small subset of the starting

and ending points in the railroad network, (X, Y ), the

Φ function is computationally less expensive than Ψ.

With Q

and Q

, it is not suﬃcient to only ﬁnd those

line segments that intersect with (x, y), but subsequently

all the paths that include those segments. Hence, the func-

tion Ψ constructs all paths between any possible start and

end points (complexity O(N

)) while Φ does the same oper-

ation on all possible station pairs (complexity O(N

)). Dur-

ing this path construction, there may be cases where there

are more than one path between two points. In these cases,

b oth Ψ and Φ choose the shortest path, or actually never

construct the other paths.

To illustrate the diﬀerence between the functions Ψ and Φ

consider the following example:

Example 2.1.: Figure 1 illustrates a railroad network

with 7 starting and ending points, out of which 4 are stations

, S

). Table 1 shows the results of applying the Ψ

and Φ functions to this network. Note that both functions

only compute the shortest path between the pair of points

among all possible paths. For example, there are two possible

paths between S

and S

. The path (L

, L

) is computed,

while (L

, L

) is discarded.

The sources S

stations

and S

railroads

contain spatial informa-

tion while S

schedules

has temporal content. Q

, Q

and Q

represent the temporal and spatial predicates of the query

Q2. Note that there are two alternative ways to evaluate

spatial part of the query using either Q

or Q

. Figure 2

depicts possible query plans to integrate the sources in a

distributed environment.

We assume that the trains travel through the shortest path

b etween two stations as it is true in the real-world applica-

tion.



sche



dules



railr



oads





ations

Q’

Deviation

Final Refinement Final Refinement

Server



Serve



r 2 Server 3

a. Joining stations and schedules ﬁrst.



Q’



SPP



Q’

Pre-computation



ations

railr



oads

sche



dules

Final Refinement Final Refinement

Server



Serve



r 2 Server 3

b. Joining stations and railroads ﬁrst.

Figure 2: Alternative query plans to integrate

sources in a distributed environment.

3. CENTRALIZED ENVIRONMENT

In this section, we brieﬂy describe the solution to the prob-

lem described in Section 2 using a centralized environment.

We integrate the sources S

stations

, S

schedules

and S

railroads

oﬀ-line using the following relational algebra expression to

generate a 3D-trajectory source.

3D T rajectory(T rain ID, T rajectory) :=

schedules



stations

) 

railroads

(1)

We model each train as a moving point in 2D or equiva-

lently a line in 3D with time as its third dimension. With

this model, we can answer the queries described in Section 2

through an intersect operation on 3D-trajectory source be-

tween each train trajectory and (x, y, [t

, t

]), where (x, y) is

the query point and [t

, t

] is the given time interval. The

relational algebra expression for the intersect operation is

given in Equation 2. This approach has been used in the

moving objects literature.

Intersect((x,y,[t

]),T rajectory)

(3D T rajectory) (2)

There are disadvantages associated with the centralized ap-

proach. The cost of building the materialized view is usu-

ally very high as the sources usually contain large sets of

data. Furthermore, update and insert operations to the orig-

inal sources lead to expensive updates of the 3D trajectory

source. In addition, the limitations of the mediator server

and/or regulations imp osed by remote sources may restrict

us from materializing the data locally.

4. DISTRIBUTED ENVIRONMENT

In this section, we describe four alternative approaches to

integrate the sources S

stations

, S

schedules

and S

railroads

in a

Result of the Ψ function Result of the Φ function

), (L

, L

), (L

, L

), (L

, L

), (L

, L

), (L

, L

) (L

, L

), (L

, L

), (L

, L

)

), (L

, L

), (L

, L

), (L

, L

), (L

) (L

, L

), (L

, L

)

), (L

, L

), (L

, L

), (L

, L

) (L

)

), (L

, L

), (L

, L

)

), (L

, L

)

Table 1: Comparison of the Ψ and Φ functions.

distributed environment. As depicted in Figure 2, the total

number of query plans is more than four. However, we ig-

nore those plans that are unrealistic due to either large data

transmission over the network or expensive local/remote

computations. In the following sections, we only brieﬂy

mention the unrealistic plans and the reasons we ignored

them.

4.1 TemporalFilterandSpatialSemi-join(T

)

With this method, depicted in Figure 2a, we join the helper

source, S

stations

, with S

schedules

. This is required to gen-

erate a common attribute (i.e., station coordinates) for the

further join operation with S

railroads

. However, the coordi-

nates of the stations are not exploited to improve the selec-

tivity of the temporal selection predicate.

As shown in Figure 2a, S

schedules

is ﬁrst ﬁltered and joined

with the helper source to generate intermediate results. Al-

ternative approaches can then be considered to join the in-

termediate results with S

railroads

. One approach is to per-

form σ

on S

railroads

and transfer the results to the server

that holds S

schedules

. The advantage of this method is that

the result of σ

usually constitutes a small set of data that

can be quickly transmitted over the network. On the other

hand, σ

is an expensive operation rendering this method

impractical. We do not consider this method for our ex-

p eriments as its eﬃciency is always outperformed by other

metho ds.

The considered approach, termed T

, avoids expensive

complexity of σ

by transferring the results of the tem-

p oral query to S

railroads

. The advantage of this method is

that we can perform a simpler spatial expression, σ

, after

railroads

and intermediate results are joined. This reduces

the complexity of the query plan. However, due to the bad

selectivity of the temporal predicate (specially for longer

time intervals), large amount of data need to be transferred

over the network that may result in longer query processing

time. We consider this query plan in our experiments as

a comparison point. In Section 4.2, we describe a new ap-

proach which exploits the coordinates of the helper source

to improve the selectivity factor of the temporal predicate.

With this query execution plan, our query can be performed

in the following steps.

• Temporal Selection: First, we perform a selection on

the S

schedules

using the departure and arrival time and

the given time interval [t

, t

] to ﬁnd all schedules that

are active during the given time interval. Next, we

join the selected schedules with S

stations

using station-

name to include coordinates of the stations in the se-

lected schedules.

• Spatial Semi-join: Next, we perform a spatial semi-

join between selected schedules and S

railroads

using

the coordinates of the stations and railroads.

• Spatial Filter : The results are then ﬁltered using spa-

tial selection expression σ

to select railroads that

intersect with the given point (x, y).

• Final Reﬁnement: Finally, we calculate the estimated

time instant t at which each train reaches the given

p oint and exclude all trains for which t has no inter-

section with the interval [t

, t

The relational algebra expression for our query using the

query execution plan is:

F R

(σ

schedules



stations

) 

railroads

))

(3)

4.2 Deviation Based Approach (DA)

We propose a diﬀerent approach, termed Deviation Based

Approach, which is similar to T

except that it exploits

the coordinates of the stations to perform a spatio-temporal

selection predicate resulting in better selectivity. This ap-

proach reduces the number of the candidate schedules by

ﬁltering out the schedules that correspond to the station

pairs connected through a path far from the query point

(x, y).

The core idea behind this approach is as follows. First, given

the coordinates of two stations, we can always compute the

shortest distance (straight line) between them. However, in

real-world, the railroad path between the two stations is not

a straight line. Hence, we estimate how much the real path

deviates from the straight line in the very worst case and de-

ﬁne a term called Path Deviation to measure the deviation.

Extending the 2D area of this deviation with the time di-

mension, we obtain a 3D capsule shape object. Meanwhile,

the intermediate spatio-temporal tuples (station coordinates

joined with schedule time intervals) can be conceptualized

as straight lines in 3D space joining points ( x

, y

, t

) and

, y

, t

). Consequently, the deviation ﬁlter only selects

those lines that intersect with the 3D capsule shape object.

This results in a much more eﬀective ﬁlter as compared to

a pure temporal ﬁlter. Meanwhile, since this ﬁlter works on

the small set of point data (station coordinates) as opposed

to the large set of line data (railroad vectors), it is not as

computationally complex as the pure spatial ﬁlter.

Definition 4.1.: Path Deviation is deﬁned as the ratio

of the sum of the euclidian distances between the start point

= (X

, Y

) and end point P

= (X

, Y

) with a given

point (x, y), over the euclidian distance between the start

and end points of a railroad segment.

P D(P

, P

, (x, y)) =

Distance(P

,(x,y))+(P

,(x,y))

Distance(P

)

(4)

Figure 3: Graphical representation of the path de-

viation approach

Based on Deﬁnition 4.1, we can compute path deviation

for each schedule that is active during the given time in-

terval [t

, t

], and exclude all schedules that have path de-

viation greater than a pre-deﬁned threshold. Path devia-

tion threshold, P

, can be pre-determined or adaptively

computed during the query time. We present several algo-

rithms to compute the path deviation threshold and prove

that this approach does not result in any false drops (see

Section 4.2.1).

The diﬀerence between the execution plan for this approach

and the approach discussed in Section 4.1 is that the path

deviation selection predicate is applied to the results of the

temp oral selection b efore transmission to the server which

hosts S

railroads

Path deviation selection expression (σ

) for our query is

given below.

= [P D(P

, P

, (x, y)) ≤ P

] (5)

The graphical representation of this approach is shown in

Figure 3. Applying path deviation selection predicate is

similar to creating a capsule shaped object around the given

p oint (x, y). The length of the object depends on the given

time interval [t

, t

], while the width of the object depends

on the threshold. We select the schedules for which a straight

line between the starting and ending stations intersects with

the capsule shaped object.

The relational algebra expression of our query using this

query execution plan is:

F R

(σ

schedules



stations

))



railroads

)) (6)

Since the path deviation predicate provides a better selec-

tivity as compared to the temporal selection predicate, this

approach results in less data transfer between the servers

and better query response time.

4.2.1 Threshold Selection

In this section, we propose alternative algorithms to deter-

mine the threshold for the path deviation approach. Our

ﬁrst method computes a constant value for the threshold.

We use Equation 7 to compute the threshold:

= max(

length(railroad)

D istance(P

, P

)

) (7)

This equation computes the maximum ratio between the ac-

tual length of the railroad connecting all pairs of the starting

and ending points in S

railroads

and the euclidian distance

between them. We prove that this equation guarantees no

false drops, i.e., no candidate trains are excluded when using

this value for the path deviation threshold.

Theorem 4.2.: For Path Deviation Threshold P

Equation 7, selection σ

results in no false drops.

Proof: We substitute P

in Equation 5 with the value of

computed by Equation 7.

(Distance(P

, (x, y )) + Distance(P

, (x, y)))

(

D istance

(

, P

))

≤

max(length(railroad)

D istance(P

, P

))

(8)

(Distance(P

, (x, y ))+Distance(P

, (x, y))) is the length of

the shortest path, l

, between the starting and ending points

that passes through the given point (x, y). Hence, if (x, y) is

on a railroad path linking the starting and ending points, the

length of the railroad must be greater then or equal to l

which in turn implies that the Equation 8 is satisﬁed. This

means that none of the railroads that intersect with (x, y) are

excluded by the path deviation selection expression σ

The only disadvantage of a constant value for the threshold

is that a large value of the path deviation for some railroads

results in a large value of the overall threshold for the entire

network. For example, in real-world this scenario happens

if a railroad track is not a straight line between two stations

(shortest distance) due to some natural limitations (e.g., ex-

istence of a lake in the way). The larger the value of the

threshold, the worse the selectivity of the path deviation

selection predicate. This means that guaranteing no false

drops in path deviation approach (i.e., satisfying Equation

8) may result in selecting and transferring more schedules.

4.2.2 Adaptive Threshold

We propose two approaches to adaptively select the value of

the threshold.

1. With the ﬁrst approach, we divide the spatial data

(e.g., S

railroads

) into distinct geographical regions and

calculate diﬀerent threshold values for each region us-

ing Equation 6. In real-world, for a ﬂat area such as a

desert, this would result in less conservative thresholds

while for an area with several mountains, a more con-

servative threshold will be chosen. Subsequently, dif-

ferent path deviation predicates with diﬀerent thresh-

olds are applied to diﬀerent regions. This results in

b etter selectivity for the path deviation selection pred-

icate as the high value of the threshold for a particular

region does not aﬀect the regions with lower values of

the threshold.

HTML Viewer

Frequently Asked Questions (19)

Q1. What are the contributions mentioned in the paper "Efficiently querying moving objects with pre-defined paths in a distributed environment" ?

The authors consider those sources maintaining moving objects with predefined paths and schedules, and investigate different plans to perform queries on the integration of these data sources efficiently. The authors show that traditional filter+semi-join plans would not result in efficient query response times on distributed spatio-temporal sources. Hence, the authors propose a novel spatio-temporal filter, called deviation filter, that exploits both the spatial and temporal characteristics of the sources in order to improve the selectivity. The authors also report on their experiments in comparing the performances of the alternative query plans and conclude that the plan with spatio-temporal filter is the most viable and superior plan.

Q2. What have the authors stated for future works in "Efficiently querying moving objects with pre-defined paths in a distributed environment" ?

The authors demonstrated that the complexity of the spatial selection operation and bad selectivity of the temporal selection operation render pure filter+semi-join query plans inefficient. The authors plan to extend this study in three ways. Finally, the authors want to incorporate their deviation based query plan into the WorldInfo Assistant [ 1 ] application and show its usefulness for efficient query evaluation.

Q3. Why is the TF SJ query plan used?

due to the bad selectivity of the temporal predicate (specially for longer time intervals), large amount of data need to be transferred over the network that may result in longer query processing time.

Q4. What is the advantage of this method?

The advantage of this method is that the authors can perform a simpler spatial expression, σQ′S , afterSrailroads and intermediate results are joined.

Q5. What is the disadvantage of the second approach?

With the second approach, the authors consider railroad paths with higher values of threshold as exceptions and exclude them from the computation of the overall threshold.

Q6. What is the relational algebra expression of the query?

The relational algebra expression of their query using this query execution plan is:σFR(σQ′ S (σQDF (σQT (Sschedules ./si Sstations))./<xi,yi> Srailroads)) (6)Since the path deviation predicate provides a better selectivity as compared to the temporal selection predicate, this approach results in less data transfer between the servers and better query response time.

Q7. What is the reason why the temporal filter is inefficient?

The authors demonstrated that the complexity of the spatial selection operation and bad selectivity of the temporal selection operation render pure filter+semi-join query plans inefficient.

Q8. What are the main drawbacks of a pure filter+semi-jo?

spatial filters (e.g., identifying all railroad segments that overlap with a given point) are computationally complex resulting in long local or remote query processing time.

Q9. What is the effect of the bad selectivity of the temporal predicate?

In addition, the effect ofthe bad selectivity of the temporal predicate is eliminated by transferring the selected railroads over the network as opposed to the selected schedules.

Q10. What is the disadvantage of a constant value for the threshold?

The only disadvantage of a constant value for the threshold is that a large value of the path deviation for some railroads results in a large value of the overall threshold for the entire network.

Q11. What is the relational algebra expression to perform their query with this approach?

The relational algebra expression to perform their query with this approach is:σFR((σQ′ S Sspp) ./si σQT (Sschedules)) (10)With this approach, the complexity of the spatial predicate is reduced as the authors use σQ′S on Sssp.

Q12. What is the way to perform the deviation based query?

It would be interesting to see how much moving object queries on real data would benefit from adaptive threshold computation of the deviation approach as compared to a single constant threshold.

Q13. How did the authors solve the problem of a spatial filter?

The authors investigated solutions to both problems by first joining each source with a helper spatial source and then 1) performing pre-computation on the combination of two spatial sources, or 2) replacing temporal filters with spatio-temporal ones on the combination of the spatial and the temporal sources.

Q14. What is the shortest railroad path between all possible stations?

Hence Nr À Ns.• QT = Intersect([ta, tb], [ti, tj ]) QT , representing the temporal predicate of the query, finds schedules that have intersection with a given time interval [ta, tb]. • QS = Intersect((x, y), Ψ(Srailroads)) where Ψ(Srailroads) is a function that calculates the shortest1 railroad paths between all possible start and end point pairs (< Xi, Yi >, < Xj , Yj >). • Q′S = Intersect((x, y), Φ(Srailroads, Sstations)) where Φ(Srailroads, Sstations) is a function that calculates the shortest railroad paths between all station pairs (< xi, yi >, < xj , yj >).

Q15. What is the disadvantage of a constant threshold?

In real-world, for a flat area such as a desert, this would result in less conservative thresholds while for an area with several mountains, a more conservative threshold will be chosen.

Q16. What is the way to select the threshold?

This results in better selectivity for the path deviation selection predicate as the high value of the threshold for a particular region does not affect the regions with lower values of the threshold.

Q17. What are the reasons the authors ignored the unrealistic plans?

the authors ignore those plans that are unrealistic due to either large data transmission over the network or expensive local/remote computations.

Q18. What is the difference between the two different types of filters?

since this filter works on the small set of point data (station coordinates) as opposed to the large set of line data (railroad vectors), it is not as computationally complex as the pure spatial filter.

Q19. What is the way to perform a query in a distributed environment?

In a distributed environment (e.g., WWW) with limited access control (i.e., read-only access), creating a new source (i.e., Sspp) in a remote server may not be possible which renders this approach impractical.

Efficiently querying moving objects with pre-defined paths in a distributed environment

Summary (2 min read)

1. INTRODUCTION

2. PROBLEM DEFINITION

3. CENTRALIZED ENVIRONMENT

4. DISTRIBUTED ENVIRONMENT

4.2.2 Adaptive Threshold

5.1 Implementation

5.2 Experimental Results

Figures (5)

Citations

Cites background from "Efficiently querying moving objects..."

References

Related Papers (5)

Frequently Asked Questions (19)

Q1. What are the contributions mentioned in the paper "Efficiently querying moving objects with pre-defined paths in a distributed environment" ?

Q2. What have the authors stated for future works in "Efficiently querying moving objects with pre-defined paths in a distributed environment" ?

Q3. Why is the TF SJ query plan used?

Q4. What is the advantage of this method?

Q5. What is the disadvantage of the second approach?

Q6. What is the relational algebra expression of the query?

Q7. What is the reason why the temporal filter is inefficient?

Q8. What are the main drawbacks of a pure filter+semi-jo?

Q9. What is the effect of the bad selectivity of the temporal predicate?

Q10. What is the disadvantage of a constant value for the threshold?

Q11. What is the relational algebra expression to perform their query with this approach?

Q12. What is the way to perform the deviation based query?

Q13. How did the authors solve the problem of a spatial filter?

Q14. What is the shortest railroad path between all possible stations?

Q15. What is the disadvantage of a constant threshold?

Q16. What is the way to select the threshold?

Q17. What are the reasons the authors ignored the unrealistic plans?

Q18. What is the difference between the two different types of filters?

Q19. What is the way to perform a query in a distributed environment?