scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Efficiently querying moving objects with pre-defined paths in a distributed environment

TL;DR: A novel spatio-temporal filter is proposed, called deviation filter, that exploits both the spatial and temporal characteristics of the sources in order to improve the selectivity of the query response times.
Abstract: Due to the recent growth of the World Wide Web, numerous spatio-temporal applications can obtain their required information from publicly available web sources. We consider those sources maintaining moving objects with predefined paths and schedules, and investigate different plans to perform queries on the integration of these data sources efficiently. Examples of such data sources are networks of railroad paths and schedules for trains running between cities connected through these networks. A typical query on such data sources is to find all trains that pass through a given point on the network within a given time interval. We show that traditional filter+semi-join plans would not result in efficient query response times on distributed spatio-temporal sources. Hence, we propose a novel spatio-temporal filter, called deviation filter, that exploits both the spatial and temporal characteristics of the sources in order to improve the selectivity. We also report on our experiments in comparing the performances of the alternative query plans and conclude that the plan with spatio-temporal filter is the most viable and superior plan.

Summary (2 min read)

1. INTRODUCTION

  • The explosive growth of the Internet has made a wealth of networked information available.
  • One solution to reduce the query processing time of moving objects with predefined paths and schedules is to precompute the required information and materialize it using a moving object data model such as the 3D Trajectory [3] model.
  • Different sources may contain overlapping information or only fragments of desired data.
  • The authors try to overcome the first obstacle by either performing a pre-computation step and then apply a less expensive function/filter, or delaying the spatial filter until they reduce the size of the spatial data (e.g., railroad vector data) significantly.
  • Section 5 reports on their experimental observations.

2. PROBLEM DEFINITION

  • The director of the movie needs to make sure that no train passes by that location while they are shooting the movie.
  • This requires having information about the schedules of the trains running on the nearby railroads.
  • The following is the list of the simplified version of the sources available on the web and in their local databases, which can be used to provide necessary information for the director of the movie.
  • Hence, the function Ψ constructs all paths between any possible start and end points (complexity O(N2r )) while Φ does the same operation on all possible station pairs (complexity O(N2s )).
  • The sources Sstations and Srailroads contain spatial information while Sschedules has temporal content.

3. CENTRALIZED ENVIRONMENT

  • The authors integrate the sources Sstations, Sschedules and Srailroads off-line using the following relational algebra expression to generate a 3D-trajectory source.
  • One can answer the queries described in Section 2 through an intersect operation on 3D-trajectory source between each train trajectory and (x, y, [ta, tb]), where (x, y) is the query point and [ta, tb] is the given time interval.the authors.
  • The relational algebra expression for the intersect operation is given in Equation 2.
  • The cost of building the materialized view is usually very high as the sources usually contain large sets of data.
  • In addition, the limitations of the mediator server and/or regulations imposed by remote sources may restrict us from materializing the data locally.

4. DISTRIBUTED ENVIRONMENT

  • The authors describe four alternative approaches to integrate the sources Sstations, Sschedules and Srailroads in a distributed environment.
  • The coordinates of the stations are not exploited to improve the selectivity of the temporal selection predicate.
  • The authors propose alternative algorithms to determine the threshold for the path deviation approach.
  • The authors use Equation 7 to compute the threshold: PDT = max( length(railroad) Distance(PS , PE) ) (7) This equation computes the maximum ratio between the actual length of the railroad connecting all pairs of the starting and ending points in Srailroads and the euclidian distance between them.

4.2.2 Adaptive Threshold

  • The authors propose two approaches to adaptively select the value of the threshold.
  • Subsequently, different path deviation predicates with different thresholds are applied to different regions.
  • Unlike the ap- proach discussed in Section 4.1, the spatial predicate is performed before the join operation between Sspp and the result of the selection on Sschedules.
  • This has the advantage that the spatial expression σQ′ S can be applied earlier to Sspp.
  • The authors first perform a spatial selection on Sspp to find all railroad paths that intersect with the given point (x, y).

5.1 Implementation

  • The configuration for the experiments consisted of two SUN servers connected through a 100 mpbs LAN.
  • Both systems run Informix Universal Server 9.2 with ESRI Spatial datablade.
  • Random railroad segments connect each station with its 3 to 16 closest stations to provide different connectivity for stations.
  • Different values of start time for the given time interval, ta, were randomly selected.

5.2 Experimental Results

  • Figure 4 depicts the results of their experiments for the four alternative query plans discussed in Section 4.
  • The Y-axis shows the average query response time in seconds.
  • This approach assumes complete control over the remote server hosting Sschedules, because both Sspp and the results of the selection predicate over Sspp must be stored at the remote server.
  • This is because of the good selectivity of this filter only for short ranges of time interval.
  • Note that although both PTF and TF SJ do not utilize the expensive σQS (because of PTF ’s precomputation step and TF SJ ’s delayed spatial operation), the overhead of the temporal data transfer over the network is so high that cancels out all the low complexity benefits of σQ′ S .

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Efficiently Querying Moving Objects with Pre-defined
Paths in a Distributed Environment
Cyrus Shahabi, Mohammad R. Kolahdouzan, Snehal Thakkar,
Jose Luis Ambite
and Craig A. Knoblock
Department of Computer Science and
Information Sciences Institute
University of Southern California
Los Angeles, California 90089
[shahabi, kolahdoz, snehalth]@usc.edu [ambite, knoblock]@isi.edu
ABSTRACT
Due to the recent growth of the World Wide Web, numer-
ous spatio-temp oral applications can obtain their required
information from publicly available web sources. We con-
sider those sources maintaining moving objects with prede-
fined paths and schedules, and investigate different plans to
p erform queries on the integration of these data sources effi-
ciently. Examples of such data sources are networks of rail-
road paths and schedules for trains running between cities
connected through these networks. A typical query on such
data sources is to find all trains that pass through a given
p oint on the network within a given time interval. We show
that traditional filter+semi-join plans would not result in ef-
ficient query response times on distributed spatio-temporal
sources. Hence, we propose a novel spatio-temp oral filter,
called deviation filter, that exploits both the spatial and tem-
p oral characteristics of the sources in order to improve the
selectivity. We also report on our experiments in comparing
the performances of the alternative query plans and con-
clude that the plan with spatio-temp oral filter is the most
viable and superior plan.
1. INTRODUCTION
The explosive growth of the Internet has made a wealth
of networked information available. Much of this informa-
tion is geographical, spatial, temporal, or pertains to ob-
jects that have a spatial or temporal nature. The sources
of this information are heterogeneous: traditional databases
This research has been funded in part by NSF grants EEC-
9529152 (IMSC ERC) and ITR-0082826, NASA/JPL con-
tract nr. 961518, DARPA and USAF under agreement nr.
F30602-99-1-0524, and unrestricted cash/equipment gifts
from NCR, IBM, Intel and SUN.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 2001 ACM XXXXXXXXX/XX/XX ...$5.00.
with spatial extensions, geographical information systems
(GIS) software packages, mapping and imagery web sites,
web sites with spatial information, etc. An increasing num-
ber of web sites have information of a geospatial or temporal
character. For example, detailed satellite images can be ob-
tained from sites such as www.terraserver.com; maps from
www.mapquest.com; train schedules from www.amtrak.com;
geolocated points of interest such as train stations from
www.usgs.gov; geographical features such as railroad net-
works from www.nima.mil ; etc. The number of sources,
the quality, and detail of the information available are con-
tinually growing all around the globe. In this paper, we
focus our attention on how to efficiently query moving ob-
jects such as trains in a distributed environment such as the
one mentioned above.
Recently there has b een a growing interest in moving object
databases that manage the spatial objects whose position
changes over time [2, 3, 4, 5, 6, 7, 8]. Example applications
are those who query the locations of trains, cars and planes
for a given time interval. The main challenge investigated
by these studies is how to model the large spatio-temporal
data needed to track the position of any object at any given
time (either in the past, future or now).
In this paper, we consider an environment where the content
of the moving object database do es not need to be modi-
fied to reflect the movement of the objects. We term this
environment as moving objects with predefined paths and
schedules. An example application is to query the loca-
tion of trains moving on a railroad network. By storing
the schedules of trains’ departures and arrivals, the loca-
tions of the stations and the vector data corresponding to
the railroad network, we have enough information to query
the location of any moving object (i.e., train) at any given
time. Note that the database still needs to be modified (e.g.,
when schedule changes), but it does not need to be updated
(and/or appended) as the objects move around the network
within the provided schedule. The challenge, however, is
that queries of the type of finding the location of a train
in a given time interval are time consuming because of the
expensive functions such as the shortest path function that
need to be performed on large vector data as well as the

temp oral intersections that need to be applied on large sets
of time intervals.
One solution to reduce the query processing time of mov-
ing objects with predefined paths and schedules is to pre-
compute the required information and materialize it using
a moving object data model such as the 3D Trajectory [3]
mo del. This is a feasible approach if we assume that differ-
ent schedules, railroads and stations information are all local
and over which we have full control. However, with our as-
sumed distributed environment, the sources of information
that we would like to access are autonomous and dynamic.
That is, we do not have administrative control over them,
cannot modify their structure, or write data to them. The
sources can change their information without warning. Dif-
ferent sources may contain overlapping information or only
fragments of desired data.
Therefore, we propose alternative distributed query plans to
realize the integration of spatial and temporal information
(e.g., network of the railroads and schedules of the trains)
from distributed, heterogeneous sources. We start by in-
vestigating traditional filter+semi-join plans by either ap-
plying the temporal filter first and then perform the spa-
tial semi-join or vice versa. However, we show that there
are two main drawbacks with pure filter+semi-join plans.
First, spatial filters (e.g., identifying all railroad segments
that overlap with a given point) are computationally com-
plex resulting in long local or remote query processing time.
Second, temporal filters (identifying all intervals that over-
lap with a given interval) usually have bad selectivity due
to the large range of intervals covered by each instance in
the temporal source (e.g., schedule table). That is, many
schedules usually intersect with any query interval. Thus,
temp oral filters cannot effectively reduce the amount of data
transferred over the network. We try to overcome the first
obstacle by either performing a pre-computation step and
then apply a less expensive function/filter, or delaying the
spatial filter until we reduce the size of the spatial data (e.g.,
railroad vector data) significantly. We address the second
obstacle by proposing a spatio-temporal filter (termed de-
viation filter), instead of temporal-only filters, which can
also exploit the spatial characteristics of the data to im-
prove the selectivity. Finally, we report on our experiments
in evaluating and comparing the p erformances of our differ-
ent moving object query plans. Although one version of the
pre-computation approach outperforms all the other plans,
the pre-computation approach may not be feasible due to
limited control over remote sources. Therefore, we conclude
that our deviation based approach, which is only marginally
worse than the pre-computation approach, is the superior
plan.
The remainder of this paper is organized as follows. We
formally define the problem in Section 2. Sections 3 and 4
discuss the solutions to the problem for the centralized and
distributed environments, respectively. Section 5 reports on
our experimental observations. Finally, we discuss the con-
clusion and future work in Section 6.
2. PROBLEM DEFINITION
To better understand the problem, consider a scenario that
a movie is being produced at a certain location on a certain
date for several hours. The director of the movie needs to
make sure that no train passes by that location while they
are shooting the movie. This requires having information
about the schedules of the trains running on the nearby
railroads.
The following is the list of the simplified version of the
sources available on the web and in our local databases,
which can be used to provide necessary information for the
director of the movie.
Name of the train stations and their geographical lo-
cations (from Silicon Mapping Solutions):
S
station
= Stations( Station-Name, Station-Point)
where station-name is a character string that contains
a unique name for each station and station-point rep-
resents the latitude and longitude of the station.
Up-to-date schedule information of the trains (from
the train company web site):
S
schedule
= Schedules( Train-ID, Departing-Station-
Name, Departing-Time, Arriving-Station-Name, Arriving-
Time )
The vector data containing the railroads’ path (from
Nima gazetter):
S
railroads
= Railroads( Railroad-ID, Railroad-Path,
Starting-Point, Ending-Point )
where the railroad-path is a 2D line with starting-point
and ending-point as its first and last vertices.
As a real-world example, the above sources of information
for the United States contain around 1000 stations, 400,000
schedules, and 170,000 line segments for the railroad net-
work. By appropriate spatio-temporal integration of the
above sources, we are interested in processing the following
types of queries:
Q1: Find the position of a train for a given time in-
stant.
Q2: Given a geographical point and a time interval,
find all the trains that pass through the point during
the given time interval.
For the reminder of this paper, to simplify the discussion,
we only focus on the more general query, i.e. Q2. We now
formally define the sources of information and predicates of
Q2 as:
Name of the stations :
s = {(s
i
)|1 i N
s
} where N
s
is the total number of
stations.
Locations of the stations:
S
stations
= {(s
i
, (x
i
, y
i
))|1 i N
s
} where (x
i
, y
i
)
sp ecify the geographical location of the station s
i
.
Schedules of the trains:
S
schedules
= {V
k
|V
k
= (T rain
id
, s
i
, s
j
, [t
i
, t
j
])|1 i, j
N
s
; 1 k M } where M is the cardinality of sched-
ules. Note that M À N
s
.
Railroad network:
S
railroads
= {(< X
i
, Y
i
> ... < X
j
, Y
j
>)|1 i, j
N
r
; (x
i
, y
i
)(X, Y )} where N
r
is the number of rail-
roads in the railroad network that is in the order of
N
2
s
, and (X, Y ) specifies the geographical location of
the starting and ending points of each railroad seg-
ment. Note that not all the start and end points are
stations. However, all stations are either a start or an
end point or both. Hence N
r
À N
s
.

S
1
S
4
S
6
V
2
V
3
V
5
L
1
L
2
L
3
L
4
L
5
S
7
L
6
L
7
Figure 1: An example of railroad networks
Q
T
= Intersect([t
a
, t
b
], [t
i
, t
j
])
Q
T
, representing the temporal predicate of the query,
finds schedules that have intersection with a given time
interval [t
a
, t
b
].
Q
S
= Intersect((x, y), Ψ(S
railroads
))
where Ψ(S
railroads
) is a function that calculates the
shortest
1
railroad paths between all possible start and
end point pairs (< X
i
, Y
i
>, < X
j
, Y
j
>).
Q
0
S
= Intersect((x, y), Φ(S
railroads
, S
stations
))
where Φ(S
railroads
, S
stations
) is a function that calcu-
lates the shortest railroad paths between all station
pairs (< x
i
, y
i
>, < x
j
, y
j
>). Since the set of the sta-
tion coordinates, (x, y), is a small subset of the starting
and ending points in the railroad network, (X, Y ), the
Φ function is computationally less expensive than Ψ.
With Q
S
and Q
0
S
, it is not sufficient to only find those
line segments that intersect with (x, y), but subsequently
all the paths that include those segments. Hence, the func-
tion Ψ constructs all paths between any possible start and
end points (complexity O(N
2
r
)) while Φ does the same oper-
ation on all possible station pairs (complexity O(N
2
s
)). Dur-
ing this path construction, there may be cases where there
are more than one path between two points. In these cases,
b oth Ψ and Φ choose the shortest path, or actually never
construct the other paths.
To illustrate the difference between the functions Ψ and Φ
consider the following example:
Example 2.1.: Figure 1 illustrates a railroad network
with 7 starting and ending points, out of which 4 are stations
(S
1
, S
4
, S
6
, S
7
). Table 1 shows the results of applying the Ψ
and Φ functions to this network. Note that both functions
only compute the shortest path between the pair of points
among all possible paths. For example, there are two possible
paths between S
4
and S
7
. The path (L
3
, L
2
, L
7
) is computed,
while (L
4
, L
5
, L
6
) is discarded.
The sources S
stations
and S
railroads
contain spatial informa-
tion while S
schedules
has temporal content. Q
T
, Q
S
and Q
0
S
represent the temporal and spatial predicates of the query
Q2. Note that there are two alternative ways to evaluate
spatial part of the query using either Q
S
or Q
0
S
. Figure 2
depicts possible query plans to integrate the sources in a
distributed environment.
1
We assume that the trains travel through the shortest path
b etween two stations as it is true in the real-world applica-
tion.
Q
T
S
sche
dules
S
railr
oads
S
st
ations
Q
S
Q’
S
Deviation
Final Refinement Final Refinement
Server
1
Serve
r 2 Server 3
a. Joining stations and schedules first.
S
S
S
Q
T
Q
T
Q’
S
S
SPP
S
Q’
Pre-computation
st
ations
railr
oads
sche
dules
Final Refinement Final Refinement
Server
1
Serve
r 2 Server 3
b. Joining stations and railroads first.
Figure 2: Alternative query plans to integrate
sources in a distributed environment.
3. CENTRALIZED ENVIRONMENT
In this section, we briefly describe the solution to the prob-
lem described in Section 2 using a centralized environment.
We integrate the sources S
stations
, S
schedules
and S
railroads
off-line using the following relational algebra expression to
generate a 3D-trajectory source.
3D T rajectory(T rain ID, T rajectory) :=
(S
schedules
s
i
S
stations
)
<x
i
,y
i
>
S
railroads
(1)
We model each train as a moving point in 2D or equiva-
lently a line in 3D with time as its third dimension. With
this model, we can answer the queries described in Section 2
through an intersect operation on 3D-trajectory source be-
tween each train trajectory and (x, y, [t
a
, t
b
]), where (x, y) is
the query point and [t
a
, t
b
] is the given time interval. The
relational algebra expression for the intersect operation is
given in Equation 2. This approach has been used in the
moving objects literature.
σ
Intersect((x,y,[t
a
,t
b
]),T rajectory)
(3D T rajectory) (2)
There are disadvantages associated with the centralized ap-
proach. The cost of building the materialized view is usu-
ally very high as the sources usually contain large sets of
data. Furthermore, update and insert operations to the orig-
inal sources lead to expensive updates of the 3D trajectory
source. In addition, the limitations of the mediator server
and/or regulations imp osed by remote sources may restrict
us from materializing the data locally.
4. DISTRIBUTED ENVIRONMENT
In this section, we describe four alternative approaches to
integrate the sources S
stations
, S
schedules
and S
railroads
in a

Result of the Ψ function Result of the Φ function
(L
1
), (L
1
, L
2
), (L
1
, L
2
, L
3
), (L
1
, L
2
, L
3
, L
4
), (L
1
, L
7
, L
6
), (L
1
, L
7
) (L
1
, L
2
, L
3
), (L
1
, L
7
), (L
1
, L
7
, L
6
)
(L
2
), (L
2
, L
3
), (L
2
, L
3
, L
4
), (L
7
, L
6
), (L
7
) (L
3
, L
2
, L
7
), (L
4
, L
5
)
(L
3
), (L
3
, L
4
), (L
2
, L
7
, L
6
), (L
2
, L
7
) (L
6
)
(L
4
), (L
4
, L
5
), (L
3
, L
2
, L
7
)
(L
5
), (L
5
, L
6
)
(L
6
)
Table 1: Comparison of the Ψ and Φ functions.
distributed environment. As depicted in Figure 2, the total
number of query plans is more than four. However, we ig-
nore those plans that are unrealistic due to either large data
transmission over the network or expensive local/remote
computations. In the following sections, we only briefly
mention the unrealistic plans and the reasons we ignored
them.
4.1 TemporalFilterandSpatialSemi-join(T
F
S
J
)
With this method, depicted in Figure 2a, we join the helper
source, S
stations
, with S
schedules
. This is required to gen-
erate a common attribute (i.e., station coordinates) for the
further join operation with S
railroads
. However, the coordi-
nates of the stations are not exploited to improve the selec-
tivity of the temporal selection predicate.
As shown in Figure 2a, S
schedules
is first filtered and joined
with the helper source to generate intermediate results. Al-
ternative approaches can then be considered to join the in-
termediate results with S
railroads
. One approach is to per-
form σ
Q
S
on S
railroads
and transfer the results to the server
that holds S
schedules
. The advantage of this method is that
the result of σ
Q
S
usually constitutes a small set of data that
can be quickly transmitted over the network. On the other
hand, σ
Q
S
is an expensive operation rendering this method
impractical. We do not consider this method for our ex-
p eriments as its efficiency is always outperformed by other
metho ds.
The considered approach, termed T
F
S
J
, avoids expensive
complexity of σ
Q
S
by transferring the results of the tem-
p oral query to S
railroads
. The advantage of this method is
that we can perform a simpler spatial expression, σ
Q
0
S
, after
S
railroads
and intermediate results are joined. This reduces
the complexity of the query plan. However, due to the bad
selectivity of the temporal predicate (specially for longer
time intervals), large amount of data need to be transferred
over the network that may result in longer query processing
time. We consider this query plan in our experiments as
a comparison point. In Section 4.2, we describe a new ap-
proach which exploits the coordinates of the helper source
to improve the selectivity factor of the temporal predicate.
With this query execution plan, our query can be performed
in the following steps.
Temporal Selection: First, we perform a selection on
the S
schedules
using the departure and arrival time and
the given time interval [t
a
, t
b
] to find all schedules that
are active during the given time interval. Next, we
join the selected schedules with S
stations
using station-
name to include coordinates of the stations in the se-
lected schedules.
Spatial Semi-join: Next, we perform a spatial semi-
join between selected schedules and S
railroads
using
the coordinates of the stations and railroads.
Spatial Filter : The results are then filtered using spa-
tial selection expression σ
Q
0
S
to select railroads that
intersect with the given point (x, y).
Final Refinement: Finally, we calculate the estimated
time instant t at which each train reaches the given
p oint and exclude all trains for which t has no inter-
section with the interval [t
a
, t
b
].
The relational algebra expression for our query using the
T
F
S
J
query execution plan is:
σ
F R
(σ
Q
0
S
(σ
Q
T
(S
schedules
s
i
S
stations
)
<x
i
,y
i
>
S
railroads
))
(3)
4.2 Deviation Based Approach (DA)
We propose a different approach, termed Deviation Based
Approach, which is similar to T
F
S
J
except that it exploits
the coordinates of the stations to perform a spatio-temporal
selection predicate resulting in better selectivity. This ap-
proach reduces the number of the candidate schedules by
filtering out the schedules that correspond to the station
pairs connected through a path far from the query point
(x, y).
The core idea behind this approach is as follows. First, given
the coordinates of two stations, we can always compute the
shortest distance (straight line) between them. However, in
real-world, the railroad path between the two stations is not
a straight line. Hence, we estimate how much the real path
deviates from the straight line in the very worst case and de-
fine a term called Path Deviation to measure the deviation.
Extending the 2D area of this deviation with the time di-
mension, we obtain a 3D capsule shape object. Meanwhile,
the intermediate spatio-temporal tuples (station coordinates
joined with schedule time intervals) can be conceptualized
as straight lines in 3D space joining points ( x
i
, y
i
, t
a
) and
(x
j
, y
j
, t
b
). Consequently, the deviation filter only selects
those lines that intersect with the 3D capsule shape object.
This results in a much more effective filter as compared to
a pure temporal filter. Meanwhile, since this filter works on
the small set of point data (station coordinates) as opposed
to the large set of line data (railroad vectors), it is not as
computationally complex as the pure spatial filter.
Definition 4.1.: Path Deviation is defined as the ratio
of the sum of the euclidian distances between the start point
P
S
= (X
i
, Y
i
) and end point P
E
= (X
j
, Y
j
) with a given
point (x, y), over the euclidian distance between the start
and end points of a railroad segment.
P D(P
S
, P
E
, (x, y)) =
Distance(P
S
,(x,y))+(P
E
,(x,y))
Distance(P
S
,P
E
)
(4)

Figure 3: Graphical representation of the path de-
viation approach
Based on Definition 4.1, we can compute path deviation
for each schedule that is active during the given time in-
terval [t
a
, t
b
], and exclude all schedules that have path de-
viation greater than a pre-defined threshold. Path devia-
tion threshold, P
DT
, can be pre-determined or adaptively
computed during the query time. We present several algo-
rithms to compute the path deviation threshold and prove
that this approach does not result in any false drops (see
Section 4.2.1).
The difference between the execution plan for this approach
and the approach discussed in Section 4.1 is that the path
deviation selection predicate is applied to the results of the
temp oral selection b efore transmission to the server which
hosts S
railroads
.
Path deviation selection expression (σ
Q
DF
) for our query is
given below.
σ
Q
DF
= [P D(P
S
, P
E
, (x, y)) P
DT
] (5)
The graphical representation of this approach is shown in
Figure 3. Applying path deviation selection predicate is
similar to creating a capsule shaped object around the given
p oint (x, y). The length of the object depends on the given
time interval [t
a
, t
b
], while the width of the object depends
on the threshold. We select the schedules for which a straight
line between the starting and ending stations intersects with
the capsule shaped object.
The relational algebra expression of our query using this
query execution plan is:
σ
F R
(σ
Q
0
S
(σ
Q
DF
(σ
Q
T
(S
schedules
s
i
S
stations
))
<x
i
,y
i
>
S
railroads
)) (6)
Since the path deviation predicate provides a better selec-
tivity as compared to the temporal selection predicate, this
approach results in less data transfer between the servers
and better query response time.
4.2.1 Threshold Selection
In this section, we propose alternative algorithms to deter-
mine the threshold for the path deviation approach. Our
first method computes a constant value for the threshold.
We use Equation 7 to compute the threshold:
P
DT
= max(
length(railroad)
D istance(P
S
, P
E
)
) (7)
This equation computes the maximum ratio between the ac-
tual length of the railroad connecting all pairs of the starting
and ending points in S
railroads
and the euclidian distance
between them. We prove that this equation guarantees no
false drops, i.e., no candidate trains are excluded when using
this value for the path deviation threshold.
Theorem 4.2.: For Path Deviation Threshold P
DT
of
Equation 7, selection σ
Q
DF
results in no false drops.
Proof: We substitute P
DT
in Equation 5 with the value of
P
DT
computed by Equation 7.
σ
Q
DF
=
(Distance(P
S
, (x, y )) + Distance(P
E
, (x, y)))
(
D istance
(
P
S
, P
E
))
max(length(railroad)
D istance(P
S
, P
E
))
(8)
(Distance(P
S
, (x, y ))+Distance(P
E
, (x, y))) is the length of
the shortest path, l
SE
, between the starting and ending points
that passes through the given point (x, y). Hence, if (x, y) is
on a railroad path linking the starting and ending points, the
length of the railroad must be greater then or equal to l
SE
,
which in turn implies that the Equation 8 is satisfied. This
means that none of the railroads that intersect with (x, y) are
excluded by the path deviation selection expression σ
Q
DF
.
The only disadvantage of a constant value for the threshold
is that a large value of the path deviation for some railroads
results in a large value of the overall threshold for the entire
network. For example, in real-world this scenario happens
if a railroad track is not a straight line between two stations
(shortest distance) due to some natural limitations (e.g., ex-
istence of a lake in the way). The larger the value of the
threshold, the worse the selectivity of the path deviation
selection predicate. This means that guaranteing no false
drops in path deviation approach (i.e., satisfying Equation
8) may result in selecting and transferring more schedules.
4.2.2 Adaptive Threshold
We propose two approaches to adaptively select the value of
the threshold.
1. With the first approach, we divide the spatial data
(e.g., S
railroads
) into distinct geographical regions and
calculate different threshold values for each region us-
ing Equation 6. In real-world, for a flat area such as a
desert, this would result in less conservative thresholds
while for an area with several mountains, a more con-
servative threshold will be chosen. Subsequently, dif-
ferent path deviation predicates with different thresh-
olds are applied to different regions. This results in
b etter selectivity for the path deviation selection pred-
icate as the high value of the threshold for a particular
region does not affect the regions with lower values of
the threshold.

Citations
More filters
Proceedings Article
01 Jan 2015
TL;DR: In this article, an approach to build knowledge graphs by exploiting semantic technologies to reconcile the data continuously crawled from diverse sources, to scale to billions of triples extracted from the crawled content, and to support interactive queries on the data.
Abstract: There is a huge amount of data spread across the web and stored in databases that we can use to build knowledge graphs. However, exploiting this data to build knowledge graphs is difficult due to the heterogeneity of the sources, scale of the amount of data, and noise in the data. In this paper we present an approach to building knowledge graphs by exploiting semantic technologies to reconcile the data continuously crawled from diverse sources, to scale to billions of triples extracted from the crawled content, and to support interactive queries on the data. We applied our approach, implemented in the DIG system, to the problem of combating human trafficking and deployed it to six law enforcement agencies and several non-governmental organizations to assist them with finding traffickers and helping victims.

71 citations

Book ChapterDOI
11 Nov 2012
TL;DR: This paper hypothesizes new composite concepts defined as disjunctions of conjunctions of (RDF) types and value restrictions, which are called restriction classes, and generates alignments between these composite concepts, and presents an evaluation of this new algorithm to Geospatial, Biological Classification, and Genetics domains.
Abstract: Despite the increase in the number of linked instances in the Linked Data Cloud in recent times, the absence of links at the concept level has resulted in heterogenous schemas, challenging the interoperability goal of the Semantic Web. In this paper, we address this problem by finding alignments between concepts from multiple Linked Data sources. Instead of only considering the existing concepts present in each ontology, we hypothesize new composite concepts defined as disjunctions of conjunctions of (RDF) types and value restrictions, which we call restriction classes, and generate alignments between these composite concepts. This extended concept language enables us to find more complete definitions and to even align sources that have rudimentary ontologies, such as those that are simple renderings of relational databases. Our concept alignment approach is based on analyzing the extensions of these concepts and their linked instances. Having explored the alignment of conjunctive concepts in our previous work, in this paper, we focus on concept coverings (disjunctions of restriction classes). We present an evaluation of this new algorithm to Geospatial, Biological Classification, and Genetics domains. The resulting alignments are useful for refining existing ontologies and determining the alignments between concepts in the ontologies, thus increasing the interoperability in the Linked Open Data Cloud.

63 citations

Journal ArticleDOI
TL;DR: The research presented in this paper introduces a relative representation of trajectories in space and time to represent space the way it is perceived by a moving observer acting in the environment, and to provide a complementary view to the usual absolute vision of space.
Abstract: The research presented in this paper introduces a relative representation of trajectories in space and time. The objective is to represent space the way it is perceived by a moving observer acting in the environment, and to provide a complementary view to the usual absolute vision of space. Trajectories are characterized from the perception of a moving observer where relative positions and relative velocities are the basic primitives. This allows for a formal identification of elementary trajectory configurations, and their relationships with the regions that compose the environment. The properties of the model are studied, including transitions and composition tables. These properties characterize trajectory transitions by the underlying processes that semantically qualify them. The approach provides a representation that might help the understanding of trajectory patterns in space and time.

48 citations


Cites background from "Efficiently querying moving objects..."

  • ...urban networks) can be also indexed using the geometrical properties of predefined paths [ 25 ]....

    [...]

Proceedings ArticleDOI
01 Jan 2017
TL;DR: An approach to extract human settlement symbols in United States Geological Survey (USGS) historical topographic maps using contemporary building data as the contextual spatial layer using a Convolutional Neural Network for the recognition task.
Abstract: Information extraction from historical maps represents a persistent challenge due to inferior graphical quality and large data volume in digital map archives, which can hold thousands of digitized map sheets. In this paper, we describe an approach to extract human settlement symbols in United States Geological Survey (USGS) historical topographic maps using contemporary building data as the contextual spatial layer. The presence of a building in the contemporary layer indicates a high probability that the same building can be found at that location on the historical map. We describe the design of an automatic sampling approach using these contemporary data to collect thousands of graphical examples for the symbol of interest. These graphical examples are then used for robust learning to then carry out feature extraction in the entire map. We employ a Convolutional Neural Network (LeNet) for the recognition task. Results are promising and will guide the next steps in this research to provide an unsupervised approach to extracting features from historical maps.

34 citations

References
More filters
Proceedings ArticleDOI
07 Apr 1997
TL;DR: This work proposes a data model for representing moving objects in database systems called the Moving Objects Spatio-Temporal (MOST) data model, and devise an algorithm for processing FTL queries in MOST.
Abstract: We propose a data model for representing moving objects in database systems. It is called the Moving Objects Spatio-Temporal (MOST) data model. We also propose Future Temporal Logic (FTL) as the query language for the MOST model, and devise an algorithm for processing FTL queries in MOST.

668 citations

Journal ArticleDOI
TL;DR: A new spatio-temporal representation of spatiotemporal data based on traditional raster and vector representations is proposed, which is as well-suited for analysing overall temporal relationships of events and patterns of events throughout a geographical area as a temporally-based representation.
Abstract: Representations historically used within GIS assume a world that exists only in the present. Information contained within a spatial database may be added-to or modified over time, but a sense of change or dynamics through time is not maintained. This limitation of current GIS capabilities has recently received substantial attention, given the increasingly urgent need to better understand geographical processes and the cause-and-effect interrelationships between human activities and the environment. Models proposed so-far for the representation of spatiotemporal data are extensions of traditional raster and vector representations that can be seen as location- or feature-based, respectively, and are therefore best organized for performing either location-based or feature-based queries. Neither form is as well-suited for analysing overall temporal relationships of events and patterns of events throughout a geographical area as a temporally-based representation. In the current paper, a new spatio-tem...

526 citations

Journal ArticleDOI
TL;DR: This paper proposes a new line of research where moving points and moving regions are viewed as 3-D (2-D space+time) or higher-dimensional entities whose structure and behavior is captured by modeling them as abstract data types.
Abstract: Spatio-temporal databases deal with geometries changing over time. In general, geometries cannot only change in discrete steps, but continuously, and we are talking about moving objects. If only the position in space of an object is relevant, then moving point is a basic abstraction; if also the extent is of interest, then the moving region abstraction captures moving as well as growing or shrinking regions. We propose a new line of research where moving points and moving regions are viewed as 3-D (2-D space+time) or higher-dimensional entities whose structure and behavior is captured by modeling them as abstract data types. Such types can be integrated as base (attribute) data types into relational, object-oriented, or other DBMS data models; they can be implemented as data blades, cartridges, etc. for extensible DBMSs. We expect these spatio-temporal data types to play a similarly fundamental role for spatio-temporal databases as spatial data types have played for spatial databases. The paper explains the approach and discusses several fundamental issues and questions related to it that need to be clarified before delving into specific designs of spatio- temporal algebras.

419 citations

Journal ArticleDOI
16 May 2000
TL;DR: A data model is formally defined for spatio-temporal databases supporting spatial objects with continuously changing position and extent, termed moving objects databases that includes complex evolving spatial structures such as line networks or multi-component regions with holes.
Abstract: We consider spatio-temporal databases supporting spatial objects with continuously changing position and extent, termed moving objects databases. We formally define a data model for such databases that includes complex evolving spatial structures such as line networks or multi-component regions with holes. The data model is given as a collection of data types and operations which can be plugged as attribute types into any DBMS data model (e.g. relational, or object-oriented) to obtain a complete model and query language. A particular novel concept is the sliced representation which represents a temporal development as a set of units, where unit types for spatial and other data types represent certain “simple” functions of time. We also show how the model can be mapped into concrete physical data structures in a DBMS environment.

377 citations

01 Jan 1997
TL;DR: The Moving Objects Spatio-Temporal (MOST) data model is proposed, and Future Temporal Logic (FTL) as the query language for the MOST model, and an algorithm for processing FTL queries in MOST is devised.
Abstract: In this paper we propose a data model for representing moving objects with uncertain positions in database systems. It is called the Moving Objects Spatio-Temporal (MOST) data model. We also propose Future Temporal Logic (FTL) as the query language for the MOST model, and devise an algorithm for processing FTL queries in MOST.

164 citations

Frequently Asked Questions (19)
Q1. What are the contributions mentioned in the paper "Efficiently querying moving objects with pre-defined paths in a distributed environment" ?

The authors consider those sources maintaining moving objects with predefined paths and schedules, and investigate different plans to perform queries on the integration of these data sources efficiently. The authors show that traditional filter+semi-join plans would not result in efficient query response times on distributed spatio-temporal sources. Hence, the authors propose a novel spatio-temporal filter, called deviation filter, that exploits both the spatial and temporal characteristics of the sources in order to improve the selectivity. The authors also report on their experiments in comparing the performances of the alternative query plans and conclude that the plan with spatio-temporal filter is the most viable and superior plan. 

The authors demonstrated that the complexity of the spatial selection operation and bad selectivity of the temporal selection operation render pure filter+semi-join query plans inefficient. The authors plan to extend this study in three ways. Finally, the authors want to incorporate their deviation based query plan into the WorldInfo Assistant [ 1 ] application and show its usefulness for efficient query evaluation. 

due to the bad selectivity of the temporal predicate (specially for longer time intervals), large amount of data need to be transferred over the network that may result in longer query processing time. 

The advantage of this method is that the authors can perform a simpler spatial expression, σQ′S , afterSrailroads and intermediate results are joined. 

With the second approach, the authors consider railroad paths with higher values of threshold as exceptions and exclude them from the computation of the overall threshold. 

The relational algebra expression of their query using this query execution plan is:σFR(σQ′ S (σQDF (σQT (Sschedules ./si Sstations))./<xi,yi> Srailroads)) (6)Since the path deviation predicate provides a better selectivity as compared to the temporal selection predicate, this approach results in less data transfer between the servers and better query response time. 

The authors demonstrated that the complexity of the spatial selection operation and bad selectivity of the temporal selection operation render pure filter+semi-join query plans inefficient. 

spatial filters (e.g., identifying all railroad segments that overlap with a given point) are computationally complex resulting in long local or remote query processing time. 

In addition, the effect ofthe bad selectivity of the temporal predicate is eliminated by transferring the selected railroads over the network as opposed to the selected schedules. 

The only disadvantage of a constant value for the threshold is that a large value of the path deviation for some railroads results in a large value of the overall threshold for the entire network. 

The relational algebra expression to perform their query with this approach is:σFR((σQ′ S Sspp) ./si σQT (Sschedules)) (10)With this approach, the complexity of the spatial predicate is reduced as the authors use σQ′S on Sssp. 

It would be interesting to see how much moving object queries on real data would benefit from adaptive threshold computation of the deviation approach as compared to a single constant threshold. 

The authors investigated solutions to both problems by first joining each source with a helper spatial source and then 1) performing pre-computation on the combination of two spatial sources, or 2) replacing temporal filters with spatio-temporal ones on the combination of the spatial and the temporal sources. 

Hence Nr À Ns.• QT = Intersect([ta, tb], [ti, tj ]) QT , representing the temporal predicate of the query, finds schedules that have intersection with a given time interval [ta, tb]. • QS = Intersect((x, y), Ψ(Srailroads)) where Ψ(Srailroads) is a function that calculates the shortest1 railroad paths between all possible start and end point pairs (< Xi, Yi >, < Xj , Yj >). • Q′S = Intersect((x, y), Φ(Srailroads, Sstations)) where Φ(Srailroads, Sstations) is a function that calculates the shortest railroad paths between all station pairs (< xi, yi >, < xj , yj >). 

In real-world, for a flat area such as a desert, this would result in less conservative thresholds while for an area with several mountains, a more conservative threshold will be chosen. 

This results in better selectivity for the path deviation selection predicate as the high value of the threshold for a particular region does not affect the regions with lower values of the threshold. 

the authors ignore those plans that are unrealistic due to either large data transmission over the network or expensive local/remote computations. 

since this filter works on the small set of point data (station coordinates) as opposed to the large set of line data (railroad vectors), it is not as computationally complex as the pure spatial filter. 

In a distributed environment (e.g., WWW) with limited access control (i.e., read-only access), creating a new source (i.e., Sspp) in a remote server may not be possible which renders this approach impractical.