scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Indexing uncertain spatio-temporal data

TL;DR: This work proposes novel approximation techniques in order to probabilistically bound the uncertain movement of objects and experimentally shows that it accelerates the existing, scan-based approach by orders of magnitude.
Abstract: The advances in sensing and telecommunication technologies allow the collection and management of vast amounts of spatio-temporal data combining location and time information.Due to physical and resource limitations of data collection devices (e.g., RFID readers, GPS receivers and other sensors) data are typically collected only at discrete points of time. In-between these discrete time instances, the positions of tracked moving objects are uncertain. In this work, we propose novel approximation techniques in order to probabilistically bound the uncertain movement of objects; these techniques allow for efficient and effective filtering during query evaluation using an hierarchical index structure.To the best of our knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics. We experimentally show that it accelerates the existing, scan-based approach by orders of magnitude.

Summary (5 min read)

1. INTRODUCTION

  • Efficient management of large collections of spatio-temporal data pertaining to mobile entities whose locations change over time is paramount in a large variety of application domains: military applications, structural and environmental monitoring, disaster/rescue management and remediation, Geographic Information Systems (GIS), Location-Based Services (LBS).
  • The technological enabling factors for such applications are advances in sensing and communication/networking, along with the miniaturizations of the computing devices and development of embedded systems.
  • Movements between observations are usually modeled by beads               the probability of an object to be located outside a bead at each point of time = 0 cope with storage constraints, often the recorded object trajectories undergo simplification, eliminating some recorded values.
  • Having object trajectories sampled only at discrete time-instants and/or simplified, renders the movement in-between samples uncertain and query evaluation challenging.

2. PRELIMINARIES

  • This section formally defines the type of spatio-temporal data that the authors index, the stochastic model that they use for uncertain trajectories derived from the data, and the query types that they handle.
  • To overcome this problem, in their proposal each uncertain spatio-temporal (UST) object o ∈ D is associated with an uncertain object trajectory, which is represented by a stochastic process.
  • Clearly, any naive approach that enumerates all possible worlds, is not feasible.
  • Another query type is the Probabilistic Spatio-Temporal τ ForAll Query ([8]), denoted as (PSTτ∀Q), which requires an object to remain in the spatial window S2 for the whole time window T 2.
  • By modeling the movement within a diamond using a Markovchain model, the true probability that an object o satisfies a PSTτ∃ query, can be computed in PTIME [8], exploiting that the matrix M is generally sparse (only a few states are directly connected to a single state).

3.1 Spatio-Temporal Approximation

  • This is done by considering all possible paths between state o(ti) at time ti and state o(tj) at time tj .
  • The set of state-time pairs Si reachable from o(ti) can be computed by performing tj − ti transitions using the Markov chain o.M(t) of o, starting from state o(ti) and memorizing all reachable (state, time) pairs.
  • Thus, the authors propose to conservatively approximate Si,j .
  • In the following, the authors will call this time dependent spatial approximation of o(t) in the time interval [ti, tj ] between two observations o(ti) and o(tj) a spatio-temporal diamond, denoted as (o, ti, tj).
  • A diamond is reminiscent to a time-parameterized rectangle, used to model the worst-case MBR for a set of moving objects in [19]; however, the way of deriving velocities is different in their case.

3.2 Spatio-Temporal Filter

  • Based on the spatio-temporal approximation of an uncertain object as described in the previous section, it is possible to perform filtering during query processing.
  • By doing so, the authors can adapt the techniques proposed in [19]: D}), they compute the points of time when the extents of the rectangles intersect in that dimension and the points of time when the extents of the diamond rectangle are fully covered by the query rectangle S2.
  • In summary, the spatio-temporal filter can be used to identify uncertain object trajectories having a probability of 100% or 0% intersecting (remaining in) the query region Q2.
  • Still, the proba- bility threshold τ of the query is not considered by this filter.

3.3 Probabilistic UST-Object Approximation

  • The authors now propose a tighter approximation, based on the intuition that the set of possible paths within a diamond is generally not uniformly distributed: paths that are close to the direct connection between the observed locations often are more likely than extreme paths along the edges of the diamond.
  • To obtain the subdiamond, the authors scale the corresponding sides by a factor λ ∈ [0, 1] relative to the average velocity vavgd = o(tj)[d]−o(ti)[d] tj−ti .
  • The main challenge for correctness, is to cope with temporal dependencies, i.e. the fact that the random variables o(ti) and o(ti + δt) are highly correlated.
  • To compute the probability of possible trajectories between o(ti) at ti and o(tj) at tj that are completely contained in (o, ti, tj , d, dir, λ), an intuitive approach is to start at o(ti) at time ti and perform tj−ti transitions using the Markov-chain o.M(t).
  • Any possible trajectory which reaches such a state is flagged.

3.5 Approximating Probabilistic Diamonds

  • The main goal of their index structure, proposed in Section 4, is to avoid expensive probability computations for subdiamonds.
  • Since the query window is not known in advance, 2D computations (i.e., one for each dimension and direction) have to performed in order to identify the optimal subdiamond for a given query and candidate object o.
  • Thus, the authors need to conservatively approximate the probability of probabilistic diamonds (o, ti, tj , d, dir, λ) for which λ /∈ Λ. We propose to use a conservative linear approximation ofP (inside(o, (o, ti, tj , d, dir, λ))) which increases monotonically with λ, using the precomputed probability values.
  • The latter constraint is required to maintain the conservativeness property of the approximation, which will be required for pruning.
  • The authors model this as a linear programming problem: find a linear function l(λ) = a · λ+ b that minimizes the aggregate error with respect to the sample points, under the constraint that the approximation line does not exceed any of the sample values (e.g., the line in Figure 4(c)).

3.6 Probabilistic Filter

  • Using this line directly, may violate the conservativeness property, since the true function may have any monotonic increasing form, and thus, for a value λQ located in between two values λ1 and λ2 (λ1, λ2 ∈ Λ, λ1 < λQ < λ2) the probability is bounded by P (λ1) ≤ P (λQ) ≤ P (λ2).
  • In their running example, the function f(λ) is depicted in Figure 4(e).
  • In turn, when probing an uncertain trajectory approximation (o, ti, tj) on the query rangeQ2, the authors only have to take the time range [ti, tj ] into account; i.e., if the time range T 2 of the query spans beyond [ti, tj ], they truncate T 2 accordingly.
  • This observation allows us to compute the probability P ∃(o) that the whole chain of diamonds of o intersects a query windowQ2.
  • Thus, the exact probability P (sometimes(o, ti, tj ,Q2)) is computed using the technique proposed in [8].

4. THE UST-TREE

  • In the previous section, the authors showed that they can precompute a set of approximations for each object, which can be progressively used to prune an object during query evaluation.
  • The authors introduce the UST-tree, which is an R-tree-based hierarchical index structure, designed to organize the object approximations and efficiently prune objects that may not possibly qualify the query; for the remaining objects the query is directly verified based on their Markov models, as described in [8] (refinement step).
  • Section 4.1 describes the structure of the UST-tree and Section 4.2 presents a generic query processing algorithm for answering PSTτ∃ queries.

4.1 Architecture

  • The UST-tree index is a hierarchical disk-based index.
  • The basic structure is illustrated in Figure 5.
  • Intermediate node entries of the UST-tree have exactly the same structure as in an R-tree; i.e., each entry contains a pointer referencing its child node and the MBR of all MBR approximations stored in pointed subtree.
  • Note that the necklace of each object is decomposed into diamonds, which are stored independently in the leaf nodes of the tree.
  • Since the directory structure of the UST-tree is identical to that of the R-tree, the UST-tree uses the same methods as the R∗-tree [2] to handle updates.

4.2 Query Evaluation

  • Given a spatio-temporal query windowQ2, the UST-tree is hierarchically traversed starting from the root, recursively visiting entries whose MBRs intersectQ2; i.e., the subtree of an intermediate entry e is pruned if e.mbr ∩ Q2 = ∅.
  • If this filter fails, the authors proceed using (o, ti, tj) (ST-Diamond filter) by performing intersection tests against Q2 as described in Section 3.2.
  • Note that sometimes multiple leaf entries associated with an object are required to prune an object or confirm whether it is a true Index Entries at Leaf Level: hit.
  • For each object o which is not pruned (or reported as true hit), the authors accumulate in a list L(o) all upper bounds of its qualification probabilities from the leaf entries that index the diamonds of o.
  • After collecting all candidate objects, the qualification probabilities stored in the list L(o) for each candidate o are aggregated in order to derive the upper bound of the overall qualification probability P (∃t ∈ T 2 : o(t) ∈ S2) for o, as described in Section 3.6.

5. EXPERIMENTAL EVALUATION

  • In order to evaluate the proposed techniques the authors used data derived from a real application and several synthetic data sets.
  • As a basis for the real world data served the trajectory data set containing one-week trajectories of 10,357 taxis in Beijing from [27].
  • Additionally to the positions of the states and the transition matrix, the authors further need observations from each object in order to build a database.
  • The time steps between two successive observations was randomly chosen from the interval [10,15] if not stated otherwise.
  • The authors experiments assess the construction cost of the UST-tree structure and its performance on query evaluation.

5.1 UST-tree Construction

  • The first experiment investigates the cost of index construction.
  • Still, this cost is reasonable, since the construction of a probabilistic diamond is comparable to the construction of 2 ·D · |Λ| subdiamonds, which in turn corresponds to one refinement step (considering the subdiamond as a query window).
  • The results reflect the theoretical considerations showing a quadratic runtime behavior with respect to both parameters.
  • Theoretically this parameter should have a linear impact on the construction time.
  • Is also excluded from larger subdiamonds in the same dimension and direction.

5.2 Query Performance

  • In the first set of query performance experiments, the authors compare the cost of using UST-tree with two competitors on synthetic data (see Figure 7).
  • This is attributed to the effectiveness of the different filter steps used by the USTtree; the overhead of the UST-tree filter is negligible compared to the savings in refinement cost.
  • The percentage of the diamonds which can be pruned using the probabilistic filter remains rather constant (at around 30%) in comparison to the spatio- temporal filter.
  • Increasing the length of the query time window T 2 increases the number of refinement candidates.
  • This shows, that the probabilistic filter copes better with more uncertainty in the data than the other two filters.

7. CONCLUSIONS

  • The authors proposed the UST-tree which is an index structure for uncertain spatio-temporal data.
  • The UST-tree adopts and incorporates state-of-the art techniques from several fields of research in order to cope with the complexity of the data.
  • To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics.
  • Outside the scope of this work is the con- sideration of an object’s location before its first and after its last observation.
  • In both cases, the resulting diamond approximation would be unbounded.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Indexing Uncertain Spatio-Temporal Data
Tobias Emrich
1
, Hans-Peter Kriegel
1
, Nikos Mamoulis
2
,
Matthias Renz
1
, Andreas Züfle
1
1
Institute for Informatics, Ludwig-Maximilians-Universität-München
2
Department of Computer Science, University of Hong Kong
{emrich, kriegel, renz, zuefle}@dbs.ifi.lmu.de, nikos@cs.hku.hk
ABSTRACT
The advances in sensing and telecommunication technologies allow
the collection and management of vast amounts of spatio-temporal
data combining location and time information. Due to physical and
resource limitations of data collection devices (e.g., RFID readers,
GPS receivers and other sensors) data are typically collected only
at discrete points of time. In-between these discrete time instances,
the positions of tracked moving objects are uncertain. In this work,
we propose novel approximation techniques in order to probabilis-
tically bound the uncertain movement of objects; these techniques
allow for efficient and effective filtering during query evaluation
using an hierarchical index structure. To the best of our knowl-
edge, this is the first approach that supports query evaluation on
very large uncertain spatio-temporal databases, adhering to possi-
ble worlds semantics. We experimentally show that it accelerates
the existing, scan-based approach by orders of magnitude.
Categories and Subject Descriptors
H.3.3 [INFORMATION STORAGE AND RETRIEVAL]: Infor-
mation Search and Retrieval
Keywords
Uncertain Spatio-Temporal Data, Uncertain Trajectory, Indexing
1. INTRODUCTION
Efficient management of large collections of spatio-temporal data
pertaining to mobile entities whose locations change over time is
paramount in a large variety of application domains: military ap-
plications, structural and environmental monitoring, disaster/rescue
management and remediation, Geographic Information Systems (GIS),
Location-Based Services (LBS). The technological enabling fac-
tors for such applications are advances in sensing and communica-
tion/networking, along with the miniaturizations of the computing
devices and development of embedded systems. In almost every
application domain, the location data at different (discrete) time-
instants is obtained via some positioning devices, like GPS-enabled
mobile devices, RFID or road-side sensors. In addition, to reduce
the communication cost, improve the bandwidth utilization, and
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CIKM’12, October 29–November 2, 2012, Maui, HI, USA.
Copyright 2012 ACM 978-1-4503-1156-4/12/10 ...$15.00.
Indexing Uncertain Spatiotemporal Data
r
epresenting uncertain spatiotemporal data
r
epresenting
uncertain
spatiotemporal
data
ace
Q
o
cation spl
o
t
t
t
t
temporalspace
movements between observations are usually modeled by beads
t
s
t
e
t
i
t
j
movements
between
observations
are
usually
modeled
by
beads
theprobabilityofanobjecttobelocatedoutsideabeadateachpointoftime=0
Figure 1: Spatio-Temporal Data.
cope with storage constraints, often the recorded object trajectories
undergo simplification, eliminating some recorded values. Hav-
ing object trajectories sampled only at discrete time-instants and/or
simplified, renders the movement in-between samples uncertain
and query evaluation challenging.
Consider an object o, moving in a one-dimensional space, as il-
lustrated in Figure 1. Having complete information about this tra-
jectory enables answering a query asking whether the object inter-
sects a spatio-temporal window Q. However, this task becomes
difficult if only a few positions (at times {t
s
, t
i
, t
j
, t
e
} in the ex-
ample) of the exact trajectory are recorded. A simple interpolation
approach, which connects temporally consecutive observations by
line segments and assumes a movement with constant direction and
speed between these points, is unacceptable for applications where
probabilistic analysis of the uncertain movement is required. When
taking uncertainty under consideration, the main challenge is that
the space of possible (location, time) positions between two ob-
servations can grow very large. More importantly, the number of
possible trajectories between two observed locations explodes.
A common method to approximate possible locations between
two observations is the beads (or necklace) model ([11, 22]). This
model is based on some constraints about the motion of an ob-
ject. In particular, assuming a maximum speed in each direction of
each dimension, the possible locations that an object can visit be-
tween two exact observations is bounded. Recent work [8] follows
the pragmatic assumption that the uncertain movement of an object
between consecutive observations can be described by a Markov-
Chain model, which captures the time dependencies between con-
secutive locations. [8] shows how the space of possible worlds
(i.e., trajectories between consecutive observations) can be effi-
ciently analyzed by multiplying Markov-Chain transition matrices
and that probabilistic query evaluation can be facilitated by inte-
grating pruning mechanisms into the Markov Chain matrices. All
these are sufficient for the case where there are few queried objects,
following similar movements; however, if there is a large number
of objects in the database, with different movements, evaluating a
probabilistic spatio-temporal query directly against each object in-
dividually (i.e., a scan-based approach) would be very expensive.

In this paper, we propose an indexing framework to efficiently
cope with large spatio-temporal data bases. Our work also assumes
that the movement of each object follows a Markov-Chain model
(described in Section 2). The objective of our index (described in
Section 4) is to minimize the number of objects for which exact
probabilistic evaluation has to be performed. To achieve this goal,
in Section 3, we propose a number of uncertain spatio-temporal
(UST) object approximations, which are stored in the index and a
set of corresponding pruning methods, which use the approxima-
tions to efficiently eliminate objects that may not qualify a given
probabilistic query. Section 5 presents an extensive experimental
evaluation, which demonstrates the effectiveness of our indexing
approach. Related work is discussed in Section 6. Finally, Section
7 concludes the paper.
2. PRELIMINARIES
This section formally defines the type of spatio-temporal data
that we index, the stochastic model that we use for uncertain trajec-
tories derived from the data, and the query types that we handle.
Data: We consider a discrete time and space domain, i.e., the com-
mon assumption of many existing works (e.g. [18, 1, 10, 8]), where
S = {s
1
, ...s
|S|
} R
D
is a finite set of possible locations, which
we call states, in a D-dimensional space and T = N
+
0
is the time
domain. Given this spatio-temporal domain, the (certain) move-
ment of an object o corresponds to a trajectory represented as func-
tion o : T S of time defining the location o(t) S of o at a
certain point of time t T . We consider incomplete (and/or im-
precise) spatio-temporal data, where the motion of an object is not
recorded by a crisp trajectory. Instead, we are only given a set
o.T
obs
of (location, time) observations for each object o. At any
time t / o.T
obs
, the position of o is uncertain, i.e., a random vari-
able. In many applications, a stochastic model can be built and used
to infer knowledge about this uncertainty.
Uncertain Data Model: We refer to the spatio-temporal approxi-
mation of a trajectory of an object o in a time interval spanned by
two consecutive observations of o at t
i
and t
j
as a bead or dia-
mond (o, t
i
, t
j
). The diamond can be computed by considering
the maximum and minimum (singed) velocities of the object in
each dimension [25]. The whole approximation of the trajectory
based on a set T
obs
of observations (i.e., a chain of beads) is re-
ferred to as a necklace. For example, the movement of the object in
Figure 1 is bounded by a chain of diamonds.
Existing studies on modeling uncertain trajectories ([23, 25, 24,
26, 15]) naively consider all possible trajectories bounded by a
necklace equi-probable. However, given two consecutive obser-
vations o(t
i
) and o(t
j
) of object o, there are time dependencies
between consecutive locations between o(t
i
) and o(t
j
), which ren-
der some locations in the corresponding diamond (e.g., those near
the line segment that connects o(t
i
) and o(t
j
)) more probable to be
visited by o than others (e.g., those near the boundary of the dia-
mond). Therefore, these models possibly yield incorrect inferences
resulting in incorrect answers to probabilistic queries. To over-
come this problem, in our proposal each uncertain spatio-temporal
(UST) object o D is associated with an uncertain object trajec-
tory, which is represented by a stochastic process. The stochastic
process assigns to each t T exactly one random variable; random
variables at consecutive time moments can be mutually dependent.
This dependency is vital in most applications, where the locations
of an object at two close points of time are highly correlated. Thus,
the uncertain trajectory o(t) of object o comprises a set of (crisp)
trajectories, each assigned with a probability indicating its likeli-
hood to be the true trajectory of o. Thereby, each trajectory with
a non-zero probability is called a possible world of o. Assuming
that objects are mutually independent, our semantics comply with
the classic possible worlds model [6]. If t o.T
obs
(i.e., the exact
location of o at time t has been observed), then o(t) corresponds to
a (trivial) random variable having one possible location (i.e., state)
with probability 1.
The main challenge in answering a probabilistic spatio-temporal
query is to correctly consider the possible worlds semantics in the
model. In other words, the query results should comply with the
possible worlds model. Naively, this can be done by evaluating the
query predicate on each possible world, and summing up the prob-
abilities of possible worlds satisfying the query predicate. In gen-
eral, the number of possible worlds to be considered is O(|S|
T
);
exhaustively examining all of them requires exponential time, even
for finite time and space domains. Clearly, any naive approach that
enumerates all possible worlds, is not feasible.
In this work, we model the uncertain movement of an object
within a diamond, using a first-order Markov-chain model. This
approach models the movement between successive points in time,
based on background knowledge (e.g., physical laws) and has proven
capable of effectively capturing the behavior of real objects in prac-
tice. For instance, [1] and [10] show how Markov-models can ef-
fectively capture the movement of vehicles on road networks for
prediction purpose. In [18], it is shown that Markov-models can
also be used to model the indoor movement of people, as tracked
in RFID applications.
DEFINITION 1. A stochastic process o(t), t T , is called a
Markov-Chain if and only if
t N
0
s
j
, s
i
, s
t1
, ...s
0
S :
P (o(t + 1) = s
j
|o(t) = s
i
, o(t 1) = s
t1
, ..., o(0) = s
0
) =
P (o(t + 1) = s
j
|o(t) = s
i
),
where the conditional probability
o.P
i,j
(t) := P (o(t + 1) = s
j
|o(t) = s
i
)
is the (single-step) transition probability of object o from state s
i
to state s
j
at time t. The matrix o.M(t) := (o.P
i,j
)
i,j
is called
transition matrix. Let o.P (t) = (p
1
, . . . , p
|S
) be the distribution
vector of an object o at time t, where p
i
corresponds to the proba-
bility that o is located at state s
i
at time t. The distribution vector
o.P (t + 1) can be inferred from o.P (t) as follows:
o.P (t + 1) = o.P (t) · o.M(t)
Queries: Within the scope of this paper, we focus on selection
queries specified by the following parameters: (i) a spatial window
S
2
S, (ii) a contiguous time window T
2
T , and (iii) a prob-
ability threshold τ . In the remainder, we use Q
2
= S
2
× T
2
to
denote the search space of a query. The most intuitive definition of
a probabilistic spatio-temporal query is given below:
DEFINITION 2. [Probabilistic Spatio-Temporal τ Exists Query]
Given a query window S
2
in space and a query window T
2
in
time, a probabilistic τ spatio-temporal exists query (PSTτ Q), re-
trieves all objects o D such that P (t T
2
: o(t) S
2
) τ;
i.e., the trajectory of o intersects the query window Q
2
with prob-
ability at least τ.
For example, consider the trajectory of Figure 1 and assume that
we only know for certain its observed locations at {t
s
, t
i
, t
j
, t
e
};
a PSTτQ query defined by rectangle Q
2
would return the de-
picted trajectory, only if the probability that the trajectory inter-
sects S
2
at any time within T
2
exceeds τ. Another query type is

the Probabilistic Spatio-Temporal τ ForAll Query ([8]), denoted as
(PSTτQ), which requires an object to remain in the spatial win-
dow S
2
for the whole time window T
2
. Due to space constraints,
we will not discuss this query in this work, but note that our tech-
niques proposed for PSTτQ queries can easily be adapted.
By modeling the movement within a diamond using a Markov-
chain model, the true probability that an object o satisfies a PSTτ
query, can be computed in PTIME [8], exploiting that the matrix
M is generally sparse (only a few states are directly connected to
a single state). Still, query evaluation remains too expensive over
a large spatio-temporal database, where we have to compute the
qualification probabilities of all objects. In view of this, we define
a set of approximations of uncertain object trajectories enabling
spatio-temporal and probabilistic filtering in Section 3. We then
show in Secion 4 how we can organize these approximations in an
index in order to perform efficient query evaluation.
3. APPROXIMATING UST-OBJECTS
In this section, we introduce (conservative) spatio-temporal as
well as probabilistic (conservative) spatio-temporal (UST-) object
approximations, which serve as building blocks for our proposed
index, to be presented in Section 4.
3.1 Spatio-Temporal Approximation
To bound the possible locations of an object o between two sub-
sequent observations (o(t
i
), o(t
j
)), we need to determine all state-
time pairs (s, t) S × T, t
i
t t
j
such that o has a non-zero
probability of being at state s at time t. This is done by consider-
ing all possible paths between state o(t
i
) at time t
i
and state o(t
j
)
at time t
j
. An example of a small set of such paths is depicted
in Figure 2(a). Here, we can see a set of five possible trajecto-
ries of an object o, i.e., all possible (state, time) pairs of o in
the time interval [t
i
, t
j
]. In practice, the number of possible paths
becomes very large. Nonetheless, we can efficiently compute the
set of possible (state, time) pairs using the Markov-chain model:
The set of state-time pairs S
i
reachable from o(t
i
) can be com-
puted by performing t
j
t
i
transitions using the Markov chain
o.M(t) of o, starting from state o(t
i
) and memorizing all reachable
(state, time) pairs. Similarly, we can compute S
j
as all state-time
pairs (s, t) S × T , t
i
t t
j
such that o can reach state o(t
j
)
at time t
j
by starting from state s at time t. S
j
can be computed in
a similar fashion, starting from state o(t
j
) and using the transposed
Markov chain o.M(t)
T
. The intersection S
i,j
= S
i
S
j
yields all
state-time pairs which are consistent with both observations. Let us
note that in practice, it is more efficient to compute S
i
and S
j
in a
parallel fashion, to reduce the explored space. When the computa-
tion of S
i
and S
j
meet at some time t
i
t t
j
, we can prune any
states which are not reachable by both s(t
i
) at time t
i
and s(t
j
) at
time t
j
. However, the number |S
i,j
| of possible state-time pairs in
S
i,j
can grow very large, so it is impractical for our index structure
(proposed in Section 4) to store all S
i,j
for each o DB in our
index structure. Thus, we propose to conservatively approximate
S
i,j
. The issue is to determine an appropriate approximation of
S
i,j
which tightly covers S
i,j
, while keeping the representation as
simple as possible. The basic idea is to build the approximation by
means of both object observations o(t
i
) and o(t
j
) with the corre-
sponding velocity of propagation in each dimension. To do so, we
first compute for the set of state-time pairs S
i
to derive the maxi-
mum and minimum possible velocity in the time interval [t
i
, t
j
]:
v
0
d
:= max
(s,t)S
i
(
s[d] o(t
i
)[d])
t t
i
)
v
6
d
:= min
(s,t)S
i
(
s[d] o(t
i
)[d])
t t
i
)
x
v
d
v
d
<
>
o(t
i
)
o(
t
)
d
o(
t
j
)
v
d
v
d
<
>
t
t
i
t
j
(a) Approximating trajectories
x
i
S
ij
(
)
S
i
,
j
(
o,t
i
,t
j
)
(
ot
i
t
j
)
t
i
t
j
(
o
,
t
i
,
t
j
)
t
(b) Diamond vs MBR
Figure 2: Spatio-Temporal Approximation.
where s[d] (o(t
i
)[d]) denotes the projection of state s (o(t
i
)) to
the d-th dimension. By definition, we can guarantee that for any
t
i
t t
j
it holds that
o(t)[d] o(t
i
)[d] + (t t
i
) · v
0
d
and
o(t)[d] o(t
i
)[d] + (t t
i
) · v
6
d
Furthermore, we bound the velocity of propagation at which o
can have reached state o(s
j
) at time t
j
from each location in the
state-space S
j
:
v
1
d
:= max
(s,t)S
j
(
o(t
j
)[d] s[d]
t
j
t
)
v
>
d
:= min
(s,t)S
j
(
o(t
j
)[d] s[d]
t
j
t
)
Again, we can bound the position of o in dimension 1 d D
at time t
i
t t
j
as follows:
o(t)[d] o(t
j
)[d] (t
j
t) · v
1
d
, and
o(t)[d] o(t
j
)[d] (t
j
t) · v
>
d
In summary, using the positions o(t
i
) at time t
i
and o(t
j
) at time
t
j
, and using velocities v
0
d
, v
6
d
, v
>
d
, v
1
d
, we can bound the random
variable of the position o(t) of o at time t
i
t t
j
by the interval
o(t)[d] I
d
(t) := [max(o(t
i
)[d]+(tt
i
)·v
6
d
, o(t
j
)[d](t
j
t)·v
>
d
)),
min(o(t
i
)[d] + (t t
i
) · v
0
d
, o(t
j
)[d] (t
j
t) · v
1
d
)] (1)
Deriving these intervals for each dimension, yields an axis-parallel
rectangle, approximating all possible positions of o at time t. In the
following, we will call this time dependent spatial approximation of
o(t) in the time interval [t
i
, t
j
] between two observations o(t
i
) and
o(t
j
) a spatio-temporal diamond, denoted as (o, t
i
, t
j
). A nice ge-
ometric property of this approximation is that computing the inter-
section with the query window at each time t is very fast. Another
advantage is that existing spatial access methods (e.g., R-trees) can
be easily used to efficiently organize these approximations. To
store the approximation, we only need to store the 4 · D real val-
ues v
0
d
, v
6
d
, v
1
d
, v
>
d
, 1 d D. A diamond is reminiscent to a
time-parameterized rectangle, used to model the worst-case MBR
for a set of moving objects in [19]; however, the way of deriving
velocities is different in our case. As an example, Figure 2(a) shows
for one dimension d D, positions o(t
i
) at time t
i
and o(t
j
) at
time t
j
. The diamond formed by the velocity bounds v
0
d
, v
6
d
, v
>
d
and v
1
d
conservatively approximates the possible (location, time)
pairs. Note that it is possible to use a minimal bounding rect-
angle 2(o, t
i
, t
j
) instead of the diamond (o, t
i
, t
j
) to conserva-
tively approximate the (location, time) space S
i,j
. In cases, how-
ever, where the movement of an object in one dimension is biased

x
d
d
Q
v
d
v
<
>
v
d
I
cov,d
I
v
d
v
d
<
>
t
I
int,d
(a) query intersection
2
d
3
I
I
3
1
2
I
1
I
2
t
(b) dimension-wise intersections
Figure 3: Intersection between query and diamond
in one direction, a rectangle may yield a very bad approximation
(see Figure 2(b) for an example). Our index employs both ap-
proximations 2(o, t
i
, t
j
) and (o, t
i
, t
j
) for spatio-temporal prun-
ing; 2(o, t
i
, t
j
) is used for high-level indexing and filtering, while
(o, t
i
, t
j
) is used as a second-level filter.
3.2 Spatio-Temporal Filter
Based on the spatio-temporal approximation of an uncertain ob-
ject as described in the previous section, it is possible to perform
filtering during query processing.
If none of the diamonds assigned to an object o D intersects
the query window, then o is safely pruned. In turn, if a diamond
of o is inside the query window S
2
in space, i.e. fully covered by
S
2
, at any point of time t T
2
, then o is a true hit and, thus, o
can be immediately reported as result of the query. In order to em-
ploy the above spatio-temporal pruning conditions, for a diamond
(o, t
i
, t
j
) of an object o we need to determine the points of time
when it intersect the query window S
2
in space, as well as the
points of time when (o, t
i
, t
j
) is fully covered by S
2
. For this
purpose, it is helpful to focus on the spatial domain S and interpret
a diamond as well as the query as a time-parameterized (moving)
rectangle. By doing so, we can adapt the techniques proposed in
[19]: In general, a rectangle R
1
intersects (covers) another rectan-
gle R
2
, if and only if R
1
intersects (covers) R
2
in each dimension.
Thus, for each spatial dimension d (d {1, . . . D}), we compute
the points of time when the extents of the rectangles intersect in that
dimension and the points of time when the extents of the diamond
rectangle are fully covered by the query rectangle S
2
.
For a single dimension d, with Equation 1 the query window
given by Q
2
d
intersects the diamond given by o(t
i
)[d], o(t
j
)[d],
v
0
d
, v
6
d
, v
1
d
and v
>
d
within the points of time
I
int,d
:= {t (T
2
[t
i
, t
j
]|I
d
(t) S
2
d
}.
Similarly, Q
2
d
fully covers the diamond within the points of time
I
cov,d
:= {t T
2
[t
i
, t
j
]|I
d
(t) S
2
d
}.
An example is illustrated in Figure 3(a). To compute both sets
I
int,d
and I
cov,d
, we intersect the margins of the diamond with
the query window resulting in a set of time intervals, which subse-
quently have to be intersected accordingly in order to derive I
int,d
and I
cov,d
. Now, let us consider the overall intersection time in-
terval I
int
=
T
D
d=1
I
int,d
(e.g., see Figure 3(b)) and the overall
points of covering time I
cov
=
T
D
d=1
I
cov,d
.
If, for an object o D, there is no diamond yielding a non-
empty set I
int
, o can be safely pruned. If any diamond of o yields
a non-empty set I
cov
, o can be reported as result.
In summary, the spatio-temporal filter can be used to identify
uncertain object trajectories having a probability of 100% or 0%
intersecting (remaining in) the query region Q
2
. Still, the proba-
bility threshold τ of the query is not considered by this filter. In
addition, the object approximation may cover a lot of dead space
if there exist outlier state-time pairs which determine one or more
of the velocities, despite having a very low probability. In the fol-
lowing, we show how to exclude such unlikely outliers in order to
shrink the approximation, while maintaining probabilistic guaran-
tees that employ the probability threshold τ.
3.3 Probabilistic UST-Object Approximation
We now propose a tighter approximation, based on the intuition
that the set of possible paths within a diamond is generally not uni-
formly distributed: paths that are close to the direct connection be-
tween the observed locations often are more likely than extreme
paths along the edges of the diamond. Therefore, given a query
with threshold τ, we can take advantage of a tighter approximation,
which bounds all paths with cumulative probabilities τ to perform
more effective pruning.
Based on this idea, we exploit the Markov-chain model in order
to compute new diamonds, which are spatio-temporal subregions,
called subdiamonds, of the (full) diamond (o, t
i
, t
j
), as depicted
in Figure 4(a). For each such subdiamond, we will then show how
to compute the cumulative probability of all possible trajectories of
o passing only through this subdiamond. Let us focus on restricting
the diamond at one direction of one dimension; we choose one di-
mension d D, and one direction dir {∧, ∨}. Direction ()
corresponds to the two diamond sides v
0
d
and v
1
d
(v
6
d
and v
>
d
). To
obtain the subdiamond, we scale the corresponding sides by a factor
λ [0, 1] relative to the average velocity v
avg
d
=
o(t
j
)[d]o(t
i
)[d]
t
j
t
i
.
We obtain the adjusted velocity values for direction as follows:
v
0
λ
d
= ((v
0
d
v
avg
d
) · λ) + v
avg
d
= v
0
d
· λ + v
avg
d
· (1 λ)
and
v
1
λ
d
= ((v
1
d
v
avg
d
) · λ) + v
avg
d
= v
1
d
· λ + v
avg
d
· (1 λ)
The adjusted velocity values for direction can be computed anal-
ogously. Thus, for a given diamond (o, t
i
, t
j
), dimension d D,
direction dir {∧, ∨} and scalar λ [0, 1], we obtain a smaller
diamond (o, t
i
, t
j
, d, dir, λ), derived from (o, t
i
, t
j
) by scaling
direction dir in dimension d by a factor of λ. Figure 4(b) illus-
trates some subdiamonds for one dimension, the direction and
for various values of λ.
To use such subdiamonds for probabilistic pruning, we first need
to compute the probability P (inside(o, (o, t
i
, t
j
, d, dir, λ))) that
object o will remain within (o, t
i
, t
j
, d, dir, λ) for the whole time
interval [t
i
, t
j
], in a correct and efficient way. The main challenge
for correctness, is to cope with temporal dependencies, i.e. the fact
that the random variables o(t
i
) and o(t
i
+ δt) are highly corre-
lated. Thus, we cannot simply treat all random variables o(t) as
mutually independent and aggregate their individual distributions.
To illustrate this issue, consider Figure 4(a), where one subdia-
mond is depicted. Assume that each of the five possible trajec-
tories has a probability of 0.2. We can see that three trajectories are
completely contained in the subdiamond, so that the probability
P (inside(o, (o, t
i
, t
j
, d, dir, λ))) that o fully remains in the sub-
diamond (o, t
i
, t
j
, d, dir, λ) is 60%. However, multiplying for all
time instants t [t
i
, t
j
] the individual probabilities that o is lo-
cated in (o, t
i
, t
j
, d, dir, λ) at time t produces an arbitrarily small
and incorrect result, as time dependencies are ignored. Further-

x
d
d
t
(a) A sub-diamond
x
d
d
λ
1
0
t
(b) Possible sub-diamonds
p
1
λ
0
0
1
(c) linear optimization
x
d
Q
1
Q
d
Q
1
Q
2
c
c
opt
t
(d) Computing λ
opt
p
λ
1
0
0
1
λ
opt
f(λ
opt
)
(e) conservative approx.
Figure 4: Probabilistic Diamonds
more, due to the generally exponential number of possible trajecto-
ries, P (inside(o, (o, t
i
, t
j
, d, dir, λ))) is too expensive to com-
pute by iterating over all possible trajectories. Instead, we compute
this probability efficiently and correctly, as follows.
To compute the probability of possible trajectories between o(t
i
)
at t
i
and o(t
j
) at t
j
that are completely contained in (o, t
i
, t
j
, d, dir, λ),
an intuitive approach is to start at o(t
i
) at time t
i
and perform t
j
t
i
transitions using the Markov-chain o.M (t). After each transition,
we identify states that are outside (o, t
i
, t
j
, d, dir, λ). Any possi-
ble trajectory which reaches such a state is flagged. Upon reaching
t
j
, we only need to consider possible trajectories in state o(t
j
),
since all other worlds have become impossible due to the observa-
tion of o at t
j
. The fraction of un-flagged worlds at state o(t
j
) at
time t
j
yields the probability that o does not completely remain in
(o, t
i
, t
j
, d, dir, λ).
To formalize the above approach, we first rewrite the probability
P (inside(o, )|o(t
i
), o(t
j
))) that the trajectory of o remains in the
subdiamond ,
1
given the observations o(t
i
), o(t
j
) at times t
i
, t
j
o.T
obs
, using the definition of conditional probability:
P (inside(o, )|o(t
i
), o(t
j
)) =
P (o(t
j
)|inside(o, ), o(t
i
))
P (o
t
j
|o
t
i
)
,
where P (o(t
j
)|inside(o, ), o(t
i
)) denotes the probability that o
reaches the state o(t
j
) observed by observation o(t
j
), given that o,
starting at o(t
i
) at time t
i
remains inside . P (o(t
j
)|o(t
i
)) denotes
the probability that state o(t
j
) at time t
j
is reached, given that o
starts at o(t
i
) at time t
i
, regardless whether o remains in .
3.4 Finding the optimal Probabilistic Diamond
In the previous section, we described, how to compute the prob-
ability of a probabilistic diamond (o, t
i
, t
j
, d, dir, λ) from a dia-
mond (o, t
i
, t
j
), dimension d, direction dir, and scaling factor λ.
In this section we will show how to find, for a given query window
Q
2
and a given query predicate the subdiamond with the highest
pruning power. Let us focus on PSTτ queries first. That is, our
aim is to find a value for d, dir and λ, such that the resulting subdi-
amond (o, t
i
, t
j
, d, dir, λ) does not intersect Q
2
, and at the same
time it has a high probability P (inside(o, )). This probability
can be used to prune o as we will show later. Formally, we want
to efficiently determine
argmax
dD,dir∈{∨,∧}[0,1]
[P (inside(o, (o, t
i
, t
j
, d, dir, λ))]
constrained to Q
2
(o, t
i
, t
j
, d, dir, λ) = .
For a single dimension d, and the north direction, a possible situ-
ation is depicted in Figure 4(d). Here, the projection
d
(o, t
i
, t
j
) of
the full diamond (o, t
i
, t
j
) to the d-th dimension and the projec-
tions Q
2
1
[d] and Q
2
2
[d] of two query windows Q
2
1
and Q
2
2
are de-
1
Since the context is clear, we simply use to denote
(o, t
i
, t
j
, d, dir, λ).
picted. The aim is to find the largest values λ
opt
of λ, such that the
corresponding probabilistic diamond (o, t
i
, t
j
, d, , λ
opt
) which
we call optimal subdiamond, does not intersect Q
2
1
(Q
2
2
). To solve
this problem, we distinguish between the following cases.
Case 1: the direct line between observations (o(t
i
), t
i
) and (o(t
j
), t
j
)
in dimension d intersects Q
2
[d]. In this case, there cannot exist any
λ [0, 1] such that Q
2
(o, t
i
, t
j
, d, dir, λ) = . Therefore, our
problem has no solution in dimension d, and d is ignored.
Case 2: the direct line between (o(t
i
), t
i
) and (o(t
j
), t
j
) does
not intersect Q
2
[d], and we assume without loss of generality that
Q
2
[d] is located above this line.
2
In addition, in this case, the
time value of the north corner c of
d
(o, t
i
, t
j
) is located in the
interval T
2
(e.g., see Q
2
2
in Figure 4(d)).
3
In this case, the edge
v
0
opt
of the optimal subdiamond (o, t
i
, t
j
, d, dim, λ
opt
) is given
by (o(t
i
), t
i
) and (s, t) where s corresponds to the lower bound of
S
2
[d] and t equals to the time component of c.
Case 3: Q
2
is above the direct line between (o(t
i
), t
i
) and (o(t
j
), t
j
)
(as in Case 2), but the time value of the north corner c of
d
(o, t
i
, t
j
)
is not located in the time interval T
2
(e.g. Q
2
1
of Figure 4(d)). In
this case, the optimal subdiamond must touch a corner of Q
2
[d]
due to convexity of both Q
2
[d] and any diamond. If Q
2
[d] is lo-
cated to the left of c (the right direction is handled symmetrically),
then the edge v
0
opt
of the optimal subdiamond is given by the line
between (o(t
i
), t
i
) and the lower right corner of Q
2
[d] (e.g., see
Figure 4(d)).
The optimal value λ
opt
for cases 2 and 3 equals the quotient
v
0
opt
v
avg
v
0
v
avg
, i.e., the fraction of the maximum velocity of the opti-
mal subdiamond and the maximum velocity of the full diamond,
both normalized by the average velocity v
avg
=
s(t
j
)[d]s(t
i
)[d]
t
j
t
i
.
After identifying the value for λ
opt
, for a dimension d and a di-
rection dir, we can compute the probability of the corresponding
subdiamond (o, t
i
, t
j
, d, dir, λ
opt
). Since we can guarantee, that
any path in this subdiamond does not intersect the query window,
we can obtain a lower bound
P
LB
(never(o, t
i
, t
j
, Q
2
))=P (inside(o, (o, t
i
, t
j
, d, dir, λ
opt
)))
(2)
of the event that o never intersects the query window in the time
interval [t
i
, t
j
]. This directly yields an upper bound
P
UB
(sometimes(o, t
i
, t
j
, Q
2
)) =
1 P (inside(o, (o, t
i
, t
j
, d, dir, λ
opt
))) (3)
of the probability that the reverse event that o intersects the query
2
If Q
2
[d] is below the line, we consider direction dir = sym-
metrically.
3
Corner c is given by the intersection of lines (o(t
i
), t
i
) + v
0
and
(o(t
j
), t
j
) + v
1
.

Citations
More filters
Journal ArticleDOI
01 Nov 2013
TL;DR: In this article, the authors address probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model and propose a sampling approach which uses Bayesian inference to guarantee that sampled trajectories conform to the observation data stored in the database.
Abstract: Nearest neighbor (NN) queries in trajectory databases have received significant attention in the past, due to their applications in spatio-temporal data analysis More recent work has considered the realistic case where the trajectories are uncertain; however, only simple uncertainty models have been proposed, which do not allow for accurate probabilistic search In this paper, we fill this gap by addressing probabilistic nearest neighbor queries in databases with uncertain trajectories modeled by stochastic processes, specifically the Markov chain model We study three nearest neighbor query semantics that take as input a query state or trajectory q and a time interval, and theoretically evaluate their runtime complexity Furthermore we propose a sampling approach which uses Bayesian inference to guarantee that sampled trajectories conform to the observation data stored in the database This sampling approach can be used in Monte-Carlo based approximation solutions We include an extensive experimental study to support our theoretical results

61 citations

Journal ArticleDOI
TL;DR: This work presents “Match Maker,” a negotiation-based model that hides exact location information data for system participants while implementing privacy preserving ride sharing, the first comprehensive approach that integrates privacy, safety and trust in a single model.
Abstract: Dynamic ride sharing is a service that enables shared vehicle rides in real time and on short notice. It can be an effective solution to counter the problem of increasing traffic jams at peak hours in cities. The growing use and popularity of smart phones and GPS-enabled devices provides us with tools required to efficiently implement ride sharing and significantly enhance carpooling. However, privacy and safety concerns are the main obstacles faced when encouraging people to use such a service. In this work, we present “Match Maker,” a negotiation-based model that hides exact location information data for system participants while implementing privacy preserving ride sharing. We use the concept of imprecision (not being precise about location of the user out of set of n locations) and follow the idea of obfuscation, which equates a higher degree of imprecision with a higher degree of privacy. We identify two attack types that could circumvent privacy preserving ride sharing. We compare the Match Maker model with the standard central trusted server model collecting precise location data, which we term eBay model. We provide the first comprehensive approach that integrates privacy, safety and trust in a single model. We present a recursive ellipse-based algorithm to compute an optimal driver path as well as three negotiation strategies for drivers and passengers. We conduct extensive experiments on real road networks and compare the strategies for privacy and effectiveness of ride sharing in terms of traffic load and vehicle km reduction. We show that ride sharing saves between 9p and 21p (on average 12p) of vehicle km if drivers are only prepared to accept slight detours of their usual trips. In the city of Melbourne, with 11.6 million trips a weekday and an average trip length of 10.2 km, this would save 14.2 million km per weekday.

43 citations


Cites background from "Indexing uncertain spatio-temporal ..."

  • ...There is a large amount of research using and extending the concept of spacetime prisms for moving object trajectory databases [Kuijpers and Othman 2009, 2010; Emrich et al. 2012]....

    [...]

Journal ArticleDOI
Dongmei Huang1, Danfeng Zhao1, Lifei Wei1, Wang Zhenhua1, Du Yanling1 
TL;DR: A broad view about marine big data and its management is illustrated, a survey on key methods and models is made, an engineering instance is introduced that demonstrates the management architecture, and the existing challenges are discussed.
Abstract: It is aware that big data has gathered tremendous attentions from academic research institutes, governments, and enterprises in all aspects of information sciences. With the development of diversity of marine data acquisition techniques, marine data grow exponentially in last decade, which forms marine big data. As an innovation, marine big data is a double-edged sword. On the one hand, there are many potential and highly useful values hidden in the huge volume of marine data, which is widely used in marine-related fields, such as tsunami and red-tide warning, prevention, and forecasting, disaster inversion, and visualization modeling after disasters. There is no doubt that the future competitions in marine sciences and technologies will surely converge into the marine data explorations. On the other hand, marine big data also brings about many new challenges in data management, such as the difficulties in data capture, storage, analysis, and applications, as well as data quality control and data security. To highlight theoretical methodologies and practical applications of marine big data, this paper illustrates a broad view about marine big data and its management, makes a survey on key methods and models, introduces an engineering instance that demonstrates the management architecture, and discusses the existing challenges.

40 citations


Cites methods from "Indexing uncertain spatio-temporal ..."

  • ...In this case, current researches mainly falls into several classes: hash index [32], tree structure index [33, 34], time-led composite index [35, 36], index dynamically adjusted with data migration [37, 38], and index optimized with parallel processing [39]....

    [...]

Proceedings ArticleDOI
19 May 2014
TL;DR: This tutorial provides a comprehensive overview of the different challenges involved in managing uncertain spatial and spatio-temporal data and presents state-of-the-art techniques for addressing them.
Abstract: Location-related data has a tremendous impact in many applications of high societal relevance and its growing volume from heterogeneous sources is one true example of a Big Data [1]. An inherent property of any spatio-temporal dataset is uncertainty due to various sources of imprecision. This tutorial provides a comprehensive overview of the different challenges involved in managing uncertain spatial and spatio-temporal data and presents state-of-the-art techniques for addressing them.

21 citations


Cites methods from "Indexing uncertain spatio-temporal ..."

  • ...Using the technique of uncertain generating functions [27], [45], a correct, effective and efficient solution to approximate kNNs will be presented....

    [...]

Book ChapterDOI
21 Apr 2014
TL;DR: This work proposes two types of RNN queries based on a well established model for uncertain spatial temporal data based on stochastic processes, namely the Markov model, and is the first to consider RNN query on uncertain trajectory databases in accordance with the possible worlds semantics.
Abstract: Reverse nearest neighbor (RNN) queries in spatial and spatio-temporal databases have received significant attention in the database research community over the last decade A reverse nearest neighbor (RNN) query finds the objects having a given query object as its nearest neighbor RNN queries find applications in data mining, marketing analysis, and decision making Most previous research on RNN queries over trajectory databases assume that the data are certain In realistic scenarios, however, trajectories are inherently uncertain due to measurement errors or time-discretized sampling In this paper, we study RNN queries in databases of uncertain trajectories We propose two types of RNN queries based on a well established model for uncertain spatial temporal data based on stochastic processes, namely the Markov model To the best of our knowledge our work is the first to consider RNN queries on uncertain trajectory databases in accordance with the possible worlds semantics We include an extensive experimental evaluation on both real and synthetic data sets to verify our theoretical results

17 citations

References
More filters
Proceedings ArticleDOI
01 May 1990
TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.
Abstract: The R-tree, one of the most popular access methods for rectangles, is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. By running numerous experiments in a standardized testbed under highly varying data, queries and operations, we were able to design the R*-tree which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory. Using our standardized testbed in an exhaustive performance comparison, it turned out that the R*-tree clearly outperforms the existing R-tree variants. Guttman's linear and quadratic R-tree and Greene's variant of the R-tree. This superiority of the R*-tree holds for different types of queries and operations, such as map overlay, for both rectangles and multidimensional points in all experiments. From a practical point of view the R*-tree is very attractive because of the following two reasons 1 it efficiently supports point and spatial data at the same time and 2 its implementation cost is only slightly higher than that of other R-trees.

4,686 citations


"Indexing uncertain spatio-temporal ..." refers methods in this paper

  • ...Although the R*-Tree has lower filter cost, the overall query performance of the UST-tree is around 3 times better than that of the R*-Tree (note the logarithmic scale)....

    [...]

  • ...The UST-tree has higher filtering cost, since the representation of the probabilistic diamonds requires more space and the tree is larger than the R*-Tree, which only stores MBR approximations but incurs much higher I/O cost for refinements....

    [...]

  • ...These MBRs are then indexed using a conventional R*-Tree [2]....

    [...]

  • ...Another advantage is that existing spatial access methods (e.g., R-trees) can be easily used to efficiently organize these approximations....

    [...]

  • ...The R*-Tree competitor approximates all possible locations (i.e. state-time pairs) between two successive observations of an object using only (o, ti, tj)....

    [...]

Journal ArticleDOI
01 Oct 2003
TL;DR: This work presents a system that automatically clusters GPS data taken over an extended period of time into meaningful locations at multiple scales and incorporates these locations into a Markov model that can be consulted for use with a variety of applications in both single-user and collaborative scenarios.
Abstract: Wearable computers have the potential to act as intelligent agents in everyday life and to assist the user in a variety of tasks, using context to determine how to act. Location is the most common form of context used by these agents to determine the user's task. However, another potential use of location context is the creation of a predictive model of the user's future movements. We present a system that automatically clusters GPS data taken over an extended period of time into meaningful locations at multiple scales. These locations are then incorporated into a Markov model that can be consulted for use with a variety of applications in both single-user and collaborative scenarios.

1,211 citations


"Indexing uncertain spatio-temporal ..." refers background in this paper

  • ..., [1, 4, 10, 18], where Markov Chains were proved successful in modeling spatio-temporal data....

    [...]

  • ...For instance, [1] and [10] show how Markov-models can effectively capture the movement of vehicles on road networks for prediction purpose....

    [...]

Journal ArticleDOI
31 Aug 2004
TL;DR: It is shown that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods, and an optimization algorithm is described that can compute efficiently most queries.
Abstract: We describe a system that supports arbitrarily complex SQL queries on probabilistic databases. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

1,113 citations


"Indexing uncertain spatio-temporal ..." refers background in this paper

  • ...Assuming that objects are mutually independent, our semantics comply with the classic possible worlds model [6]....

    [...]

Journal ArticleDOI
16 May 2000
TL;DR: A novel, R*-tree based indexing technique that supports the efficient querying of the current and projected future positions of moving objects and is capable of indexing objects moving in one-, two-, and three-dimensional space is proposed.
Abstract: The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R*-tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one-, two-, and three-dimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. A comprehensive performance study is reported.

880 citations

Proceedings ArticleDOI
21 Aug 2011
TL;DR: A Cloud-based system computing customized and practically fast driving routes for an end user using (historical and real-time) traffic conditions and driver behavior, which accurately estimates the travel time of a route for a user; hence finding the fastest route customized for the user.
Abstract: This paper presents a Cloud-based system computing customized and practically fast driving routes for an end user using (historical and real-time) traffic conditions and driver behavior. In this system, GPS-equipped taxicabs are employed as mobile sensors constantly probing the traffic rhythm of a city and taxi drivers' intelligence in choosing driving directions in the physical world. Meanwhile, a Cloud aggregates and mines the information from these taxis and other sources from the Internet, like Web maps and weather forecast. The Cloud builds a model incorporating day of the week, time of day, weather conditions, and individual driving strategies (both of the taxi drivers and of the end user for whom the route is being computed). Using this model, our system predicts the traffic conditions of a future time (when the computed route is actually driven) and performs a self-adaptive driving direction service for a particular user. This service gradually learns a user's driving behavior from the user's GPS logs and customizes the fastest route for the user with the help of the Cloud. We evaluate our service using a real-world dataset generated by over 33,000 taxis over a period of 3 months in Beijing. As a result, our service accurately estimates the travel time of a route for a user; hence finding the fastest route customized for the user.

758 citations


"Indexing uncertain spatio-temporal ..." refers background in this paper

  • ...As a basis for the real world data served the trajectory data set containing one-week trajectories of 10,357 taxis in Beijing from [27]....

    [...]

Frequently Asked Questions (17)
Q1. What contributions have the authors mentioned in the paper "Indexing uncertain spatio-temporal data" ?

In this work, the authors propose novel approximation techniques in order to probabilistically bound the uncertain movement of objects ; these techniques allow for efficient and effective filtering during query evaluation using an hierarchical index structure. To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics. The authors experimentally show that it accelerates the existing, scan-based approach by orders of magnitude. 

The technological enabling factors for such applications are advances in sensing and communication/networking, along with the miniaturizations of the computing devices and development of embedded systems. 

Thus quantifiers such as “always”, “sometimes”, “definitely” and “possibly” are used to indicate whether an object intersects a given spatio-temporal query window. 

For the observations of the real dataset the authors used the GPS data of taxis, where the time between two signals is between 2 and 20 minutes. 

The set of state-time pairs Si reachable from o(ti) can be computed by performing tj − ti transitions using the Markov chain o.M(t) of o, starting from state o(ti) and memorizing all reachable (state, time) pairs. 

The prevalent approach is to bound the possible positions of an object at each point of time by simple a spatial structure resulting in a spatio-temporal approximation. 

The spatial extent of the query windows in each dimension was set to 0.1 and the duration of the queries was set to 10 time steps by default. 

In addition, to reduce the communication cost, improve the bandwidth utilization, andPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. 

Additionally to the positions of the states and the transition matrix, the authors further need observations from each object in order to build a database. 

In general, the number of possible worlds to be considered is O(|S|T ); exhaustively examining all of them requires exponential time, even for finite time and space domains. 

The authors can also observe that the probabilistic filter can reduce the number of refinements by 30% after applying the sequence of spatio-temporal filters. 

The authors showed how the most common query types (spatio-temporal ∃- and ∀-window queries) can be efficiently processed using probabilistic bounds which are computed during index construction. 

Those connections correspond to the possible movements of an object in the space and the authors randomly assigned probabilities to each connection such that the sum of all outgoing edges sums up to 1. 

In a streaming scenario with several updates/insertions per second and large probabilistic diamonds (due to high speed of objects or large intervals between observations), the construction of probabilistic diamonds can be performed in parallel and is therefore still feasible. 

To avoid these computations at run-time, the authors propose to precompute, for each diamond (o, ti, tj) in D, probabilistic subdiamonds for each dimension and direction and for a set Λ of λvalues. 

In particular, assuming a maximum speed in each direction of each dimension, the possible locations that an object can visit between two exact observations is bounded. 

To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics.