Proceedings Article•DOI•

Indexing uncertain spatio-temporal data

Q: What contributions have the authors mentioned in the paper "Indexing uncertain spatio-temporal data" ?

In this work, the authors propose novel approximation techniques in order to probabilistically bound the uncertain movement of objects ; these techniques allow for efficient and effective filtering during query evaluation using an hierarchical index structure. To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics. The authors experimentally show that it accelerates the existing, scan-based approach by orders of magnitude.

Q: What is the common way to indicate whether an object intersects a given window?

Thus quantifiers such as “always”, “sometimes”, “definitely” and “possibly” are used to indicate whether an object intersects a given spatio-temporal query window.

Q: What data sets were used for the observations of taxis?

For the observations of the real dataset the authors used the GPS data of taxis, where the time between two signals is between 2 and 20 minutes.

Q: What is the common approach to bound the possible positions of an object at each point of time?

The prevalent approach is to bound the possible positions of an object at each point of time by simple a spatial structure resulting in a spatio-temporal approximation.

Q: How many times did the query windows in each dimension be set to 0.1?

The spatial extent of the query windows in each dimension was set to 0.1 and the duration of the queries was set to 10 time steps by default.

Q: Why is the author granting permission to make digital or hard copies of this work?

In addition, to reduce the communication cost, improve the bandwidth utilization, andPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.

Q: What are the observations needed to build a database?

Additionally to the positions of the states and the transition matrix, the authors further need observations from each object in order to build a database.

Q: What is the number of possible worlds to be considered?

In general, the number of possible worlds to be considered is O(|S|T ); exhaustively examining all of them requires exponential time, even for finite time and space domains.

Tobias Emrich¹, Hans-Peter Kriegel¹, Nikos Mamoulis², Matthias Renz¹, Andreas Züfle¹ - Show less +1 more•Institutions (2)

Ludwig Maximilian University of Munich¹, University of Hong Kong²

29 Oct 2012-pp 395-404

TL;DR: This work proposes novel approximation techniques in order to probabilistically bound the uncertain movement of objects and experimentally shows that it accelerates the existing, scan-based approach by orders of magnitude.

read less

Abstract: The advances in sensing and telecommunication technologies allow the collection and management of vast amounts of spatio-temporal data combining location and time information.Due to physical and resource limitations of data collection devices (e.g., RFID readers, GPS receivers and other sensors) data are typically collected only at discrete points of time. In-between these discrete time instances, the positions of tracked moving objects are uncertain. In this work, we propose novel approximation techniques in order to probabilistically bound the uncertain movement of objects; these techniques allow for efficient and effective filtering during query evaluation using an hierarchical index structure.To the best of our knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics. We experimentally show that it accelerates the existing, scan-based approach by orders of magnitude.

...read moreread less

Summary (5 min read)

Jump to: [1. INTRODUCTION] – [2. PRELIMINARIES] – [3.1 Spatio-Temporal Approximation] – [3.2 Spatio-Temporal Filter] – [3.3 Probabilistic UST-Object Approximation] – [3.5 Approximating Probabilistic Diamonds] – [3.6 Probabilistic Filter] – [4. THE UST-TREE] – [4.1 Architecture] – [4.2 Query Evaluation] – [5. EXPERIMENTAL EVALUATION] – [5.1 UST-tree Construction] – [5.2 Query Performance] – [6. RELATED WORK] and [7. CONCLUSIONS]

1. INTRODUCTION

Efficient management of large collections of spatio-temporal data pertaining to mobile entities whose locations change over time is paramount in a large variety of application domains: military applications, structural and environmental monitoring, disaster/rescue management and remediation, Geographic Information Systems (GIS), Location-Based Services (LBS).
The technological enabling factors for such applications are advances in sensing and communication/networking, along with the miniaturizations of the computing devices and development of embedded systems.
Movements between observations are usually modeled by beads the probability of an object to be located outside a bead at each point of time = 0 cope with storage constraints, often the recorded object trajectories undergo simplification, eliminating some recorded values.
Having object trajectories sampled only at discrete time-instants and/or simplified, renders the movement in-between samples uncertain and query evaluation challenging.

2. PRELIMINARIES

This section formally defines the type of spatio-temporal data that the authors index, the stochastic model that they use for uncertain trajectories derived from the data, and the query types that they handle.
To overcome this problem, in their proposal each uncertain spatio-temporal (UST) object o ∈ D is associated with an uncertain object trajectory, which is represented by a stochastic process.
Clearly, any naive approach that enumerates all possible worlds, is not feasible.
Another query type is the Probabilistic Spatio-Temporal τ ForAll Query ([8]), denoted as (PSTτ∀Q), which requires an object to remain in the spatial window S2 for the whole time window T 2.
By modeling the movement within a diamond using a Markovchain model, the true probability that an object o satisfies a PSTτ∃ query, can be computed in PTIME [8], exploiting that the matrix M is generally sparse (only a few states are directly connected to a single state).

3.1 Spatio-Temporal Approximation

This is done by considering all possible paths between state o(ti) at time ti and state o(tj) at time tj .
The set of state-time pairs Si reachable from o(ti) can be computed by performing tj − ti transitions using the Markov chain o.M(t) of o, starting from state o(ti) and memorizing all reachable (state, time) pairs.
Thus, the authors propose to conservatively approximate Si,j .
In the following, the authors will call this time dependent spatial approximation of o(t) in the time interval [ti, tj ] between two observations o(ti) and o(tj) a spatio-temporal diamond, denoted as (o, ti, tj).
A diamond is reminiscent to a time-parameterized rectangle, used to model the worst-case MBR for a set of moving objects in [19]; however, the way of deriving velocities is different in their case.

3.2 Spatio-Temporal Filter

Based on the spatio-temporal approximation of an uncertain object as described in the previous section, it is possible to perform filtering during query processing.
By doing so, the authors can adapt the techniques proposed in [19]: D}), they compute the points of time when the extents of the rectangles intersect in that dimension and the points of time when the extents of the diamond rectangle are fully covered by the query rectangle S2.
In summary, the spatio-temporal filter can be used to identify uncertain object trajectories having a probability of 100% or 0% intersecting (remaining in) the query region Q2.
Still, the proba- bility threshold τ of the query is not considered by this filter.

3.3 Probabilistic UST-Object Approximation

The authors now propose a tighter approximation, based on the intuition that the set of possible paths within a diamond is generally not uniformly distributed: paths that are close to the direct connection between the observed locations often are more likely than extreme paths along the edges of the diamond.
To obtain the subdiamond, the authors scale the corresponding sides by a factor λ ∈ [0, 1] relative to the average velocity vavgd = o(tj)[d]−o(ti)[d] tj−ti .
The main challenge for correctness, is to cope with temporal dependencies, i.e. the fact that the random variables o(ti) and o(ti + δt) are highly correlated.
To compute the probability of possible trajectories between o(ti) at ti and o(tj) at tj that are completely contained in (o, ti, tj , d, dir, λ), an intuitive approach is to start at o(ti) at time ti and perform tj−ti transitions using the Markov-chain o.M(t).
Any possible trajectory which reaches such a state is flagged.

3.5 Approximating Probabilistic Diamonds

The main goal of their index structure, proposed in Section 4, is to avoid expensive probability computations for subdiamonds.
Since the query window is not known in advance, 2D computations (i.e., one for each dimension and direction) have to performed in order to identify the optimal subdiamond for a given query and candidate object o.
Thus, the authors need to conservatively approximate the probability of probabilistic diamonds (o, ti, tj , d, dir, λ) for which λ /∈ Λ. We propose to use a conservative linear approximation ofP (inside(o, (o, ti, tj , d, dir, λ))) which increases monotonically with λ, using the precomputed probability values.
The latter constraint is required to maintain the conservativeness property of the approximation, which will be required for pruning.
The authors model this as a linear programming problem: find a linear function l(λ) = a · λ+ b that minimizes the aggregate error with respect to the sample points, under the constraint that the approximation line does not exceed any of the sample values (e.g., the line in Figure 4(c)).

3.6 Probabilistic Filter

Using this line directly, may violate the conservativeness property, since the true function may have any monotonic increasing form, and thus, for a value λQ located in between two values λ1 and λ2 (λ1, λ2 ∈ Λ, λ1 < λQ < λ2) the probability is bounded by P (λ1) ≤ P (λQ) ≤ P (λ2).
In their running example, the function f(λ) is depicted in Figure 4(e).
In turn, when probing an uncertain trajectory approximation (o, ti, tj) on the query rangeQ2, the authors only have to take the time range [ti, tj ] into account; i.e., if the time range T 2 of the query spans beyond [ti, tj ], they truncate T 2 accordingly.
This observation allows us to compute the probability P ∃(o) that the whole chain of diamonds of o intersects a query windowQ2.
Thus, the exact probability P (sometimes(o, ti, tj ,Q2)) is computed using the technique proposed in [8].

4. THE UST-TREE

In the previous section, the authors showed that they can precompute a set of approximations for each object, which can be progressively used to prune an object during query evaluation.
The authors introduce the UST-tree, which is an R-tree-based hierarchical index structure, designed to organize the object approximations and efficiently prune objects that may not possibly qualify the query; for the remaining objects the query is directly verified based on their Markov models, as described in [8] (refinement step).
Section 4.1 describes the structure of the UST-tree and Section 4.2 presents a generic query processing algorithm for answering PSTτ∃ queries.

4.1 Architecture

The UST-tree index is a hierarchical disk-based index.
The basic structure is illustrated in Figure 5.
Intermediate node entries of the UST-tree have exactly the same structure as in an R-tree; i.e., each entry contains a pointer referencing its child node and the MBR of all MBR approximations stored in pointed subtree.
Note that the necklace of each object is decomposed into diamonds, which are stored independently in the leaf nodes of the tree.
Since the directory structure of the UST-tree is identical to that of the R-tree, the UST-tree uses the same methods as the R∗-tree [2] to handle updates.

4.2 Query Evaluation

Given a spatio-temporal query windowQ2, the UST-tree is hierarchically traversed starting from the root, recursively visiting entries whose MBRs intersectQ2; i.e., the subtree of an intermediate entry e is pruned if e.mbr ∩ Q2 = ∅.
If this filter fails, the authors proceed using (o, ti, tj) (ST-Diamond filter) by performing intersection tests against Q2 as described in Section 3.2.
Note that sometimes multiple leaf entries associated with an object are required to prune an object or confirm whether it is a true Index Entries at Leaf Level: hit.
For each object o which is not pruned (or reported as true hit), the authors accumulate in a list L(o) all upper bounds of its qualification probabilities from the leaf entries that index the diamonds of o.
After collecting all candidate objects, the qualification probabilities stored in the list L(o) for each candidate o are aggregated in order to derive the upper bound of the overall qualification probability P (∃t ∈ T 2 : o(t) ∈ S2) for o, as described in Section 3.6.

5. EXPERIMENTAL EVALUATION

In order to evaluate the proposed techniques the authors used data derived from a real application and several synthetic data sets.
As a basis for the real world data served the trajectory data set containing one-week trajectories of 10,357 taxis in Beijing from [27].
Additionally to the positions of the states and the transition matrix, the authors further need observations from each object in order to build a database.
The time steps between two successive observations was randomly chosen from the interval [10,15] if not stated otherwise.
The authors experiments assess the construction cost of the UST-tree structure and its performance on query evaluation.

5.1 UST-tree Construction

The first experiment investigates the cost of index construction.
Still, this cost is reasonable, since the construction of a probabilistic diamond is comparable to the construction of 2 ·D · |Λ| subdiamonds, which in turn corresponds to one refinement step (considering the subdiamond as a query window).
The results reflect the theoretical considerations showing a quadratic runtime behavior with respect to both parameters.
Theoretically this parameter should have a linear impact on the construction time.
Is also excluded from larger subdiamonds in the same dimension and direction.

5.2 Query Performance

In the first set of query performance experiments, the authors compare the cost of using UST-tree with two competitors on synthetic data (see Figure 7).
This is attributed to the effectiveness of the different filter steps used by the USTtree; the overhead of the UST-tree filter is negligible compared to the savings in refinement cost.
The percentage of the diamonds which can be pruned using the probabilistic filter remains rather constant (at around 30%) in comparison to the spatio- temporal filter.
Increasing the length of the query time window T 2 increases the number of refinement candidates.
This shows, that the probabilistic filter copes better with more uncertainty in the data than the other two filters.

7. CONCLUSIONS

The authors proposed the UST-tree which is an index structure for uncertain spatio-temporal data.
The UST-tree adopts and incorporates state-of-the art techniques from several fields of research in order to cope with the complexity of the data.
To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics.
Outside the scope of this work is the con- sideration of an object’s location before its first and after its last observation.
In both cases, the resulting diamond approximation would be unbounded.

Did you find this useful? Give us your feedback

Figures (9)

Figure 2: Spatio-Temporal Approximation.

Figure 3: Intersection between query and diamond

Figure 7: Overall Performance (Synthetic Data Set)

Content maybe subject to copyright Report

Indexing Uncertain Spatio-Temporal Data

Tobias Emrich

, Hans-Peter Kriegel

, Nikos Mamoulis

Matthias Renz

, Andreas Züﬂe

Institute for Informatics, Ludwig-Maximilians-Universität-München

Department of Computer Science, University of Hong Kong

{emrich, kriegel, renz, zueﬂe}@dbs.iﬁ.lmu.de, nikos@cs.hku.hk

ABSTRACT

The advances in sensing and telecommunication technologies allow

the collection and management of vast amounts of spatio-temporal

data combining location and time information. Due to physical and

resource limitations of data collection devices (e.g., RFID readers,

GPS receivers and other sensors) data are typically collected only

at discrete points of time. In-between these discrete time instances,

the positions of tracked moving objects are uncertain. In this work,

we propose novel approximation techniques in order to probabilis-

tically bound the uncertain movement of objects; these techniques

allow for efﬁcient and effective ﬁltering during query evaluation

using an hierarchical index structure. To the best of our knowl-

edge, this is the ﬁrst approach that supports query evaluation on

very large uncertain spatio-temporal databases, adhering to possi-

ble worlds semantics. We experimentally show that it accelerates

the existing, scan-based approach by orders of magnitude.

Categories and Subject Descriptors

H.3.3 [INFORMATION STORAGE AND RETRIEVAL]: Infor-

mation Search and Retrieval

Keywords

Uncertain Spatio-Temporal Data, Uncertain Trajectory, Indexing

1. INTRODUCTION

Efﬁcient management of large collections of spatio-temporal data

pertaining to mobile entities whose locations change over time is

paramount in a large variety of application domains: military ap-

plications, structural and environmental monitoring, disaster/rescue

management and remediation, Geographic Information Systems (GIS),

Location-Based Services (LBS). The technological enabling fac-

tors for such applications are advances in sensing and communica-

tion/networking, along with the miniaturizations of the computing

devices and development of embedded systems. In almost every

application domain, the location data at different (discrete) time-

instants is obtained via some positioning devices, like GPS-enabled

mobile devices, RFID or road-side sensors. In addition, to reduce

the communication cost, improve the bandwidth utilization, and

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

CIKM’12, October 29–November 2, 2012, Maui, HI, USA.

Indexing Uncertain Spatiotemporal Data

epresenting uncertain spatiotemporal data

epresenting



uncertain



spatiotemporal



data

ace

cation spl

temporalspace

movements between observations are usually modeled by beads

movements



between



observations



are



usually



modeled



beads

theprobabilityofanobjecttobelocatedoutsideabeadateachpointoftime=0

Figure 1: Spatio-Temporal Data.

cope with storage constraints, often the recorded object trajectories

undergo simpliﬁcation, eliminating some recorded values. Hav-

ing object trajectories sampled only at discrete time-instants and/or

simpliﬁed, renders the movement in-between samples uncertain

and query evaluation challenging.

Consider an object o, moving in a one-dimensional space, as il-

lustrated in Figure 1. Having complete information about this tra-

jectory enables answering a query asking whether the object inter-

sects a spatio-temporal window Q. However, this task becomes

difﬁcult if only a few positions (at times {t

, t

} in the ex-

ample) of the exact trajectory are recorded. A simple interpolation

approach, which connects temporally consecutive observations by

line segments and assumes a movement with constant direction and

speed between these points, is unacceptable for applications where

probabilistic analysis of the uncertain movement is required. When

taking uncertainty under consideration, the main challenge is that

the space of possible (location, time) positions between two ob-

servations can grow very large. More importantly, the number of

possible trajectories between two observed locations explodes.

A common method to approximate possible locations between

two observations is the beads (or necklace) model ([11, 22]). This

model is based on some constraints about the motion of an ob-

ject. In particular, assuming a maximum speed in each direction of

each dimension, the possible locations that an object can visit be-

tween two exact observations is bounded. Recent work [8] follows

the pragmatic assumption that the uncertain movement of an object

between consecutive observations can be described by a Markov-

Chain model, which captures the time dependencies between con-

secutive locations. [8] shows how the space of possible worlds

(i.e., trajectories between consecutive observations) can be efﬁ-

ciently analyzed by multiplying Markov-Chain transition matrices

and that probabilistic query evaluation can be facilitated by inte-

grating pruning mechanisms into the Markov Chain matrices. All

these are sufﬁcient for the case where there are few queried objects,

following similar movements; however, if there is a large number

of objects in the database, with different movements, evaluating a

probabilistic spatio-temporal query directly against each object in-

dividually (i.e., a scan-based approach) would be very expensive.

In this paper, we propose an indexing framework to efﬁciently

cope with large spatio-temporal data bases. Our work also assumes

that the movement of each object follows a Markov-Chain model

(described in Section 2). The objective of our index (described in

Section 4) is to minimize the number of objects for which exact

probabilistic evaluation has to be performed. To achieve this goal,

in Section 3, we propose a number of uncertain spatio-temporal

(UST) object approximations, which are stored in the index and a

set of corresponding pruning methods, which use the approxima-

tions to efﬁciently eliminate objects that may not qualify a given

probabilistic query. Section 5 presents an extensive experimental

evaluation, which demonstrates the effectiveness of our indexing

approach. Related work is discussed in Section 6. Finally, Section

7 concludes the paper.

2. PRELIMINARIES

This section formally deﬁnes the type of spatio-temporal data

that we index, the stochastic model that we use for uncertain trajec-

tories derived from the data, and the query types that we handle.

Data: We consider a discrete time and space domain, i.e., the com-

mon assumption of many existing works (e.g. [18, 1, 10, 8]), where

S = {s

, ...s

|S|

} ⊆ R

is a ﬁnite set of possible locations, which

we call states, in a D-dimensional space and T = N

is the time

domain. Given this spatio-temporal domain, the (certain) move-

ment of an object o corresponds to a trajectory represented as func-

tion o : T → S of time deﬁning the location o(t) ∈ S of o at a

certain point of time t ∈ T . We consider incomplete (and/or im-

precise) spatio-temporal data, where the motion of an object is not

recorded by a crisp trajectory. Instead, we are only given a set

o.T

obs

of (location, time) observations for each object o. At any

time t /∈ o.T

obs

, the position of o is uncertain, i.e., a random vari-

able. In many applications, a stochastic model can be built and used

to infer knowledge about this uncertainty.

Uncertain Data Model: We refer to the spatio-temporal approxi-

mation of a trajectory of an object o in a time interval spanned by

two consecutive observations of o at t

and t

as a bead or dia-

mond (o, t

, t

). The diamond can be computed by considering

the maximum and minimum (singed) velocities of the object in

each dimension [25]. The whole approximation of the trajectory

based on a set T

obs

of observations (i.e., a chain of beads) is re-

ferred to as a necklace. For example, the movement of the object in

Figure 1 is bounded by a chain of diamonds.

Existing studies on modeling uncertain trajectories ([23, 25, 24,

26, 15]) naively consider all possible trajectories bounded by a

necklace equi-probable. However, given two consecutive obser-

vations o(t

) and o(t

) of object o, there are time dependencies

between consecutive locations between o(t

) and o(t

), which ren-

der some locations in the corresponding diamond (e.g., those near

the line segment that connects o(t

) and o(t

)) more probable to be

visited by o than others (e.g., those near the boundary of the dia-

mond). Therefore, these models possibly yield incorrect inferences

resulting in incorrect answers to probabilistic queries. To over-

come this problem, in our proposal each uncertain spatio-temporal

(UST) object o ∈ D is associated with an uncertain object trajec-

tory, which is represented by a stochastic process. The stochastic

process assigns to each t ∈ T exactly one random variable; random

variables at consecutive time moments can be mutually dependent.

This dependency is vital in most applications, where the locations

of an object at two close points of time are highly correlated. Thus,

the uncertain trajectory o(t) of object o comprises a set of (crisp)

trajectories, each assigned with a probability indicating its likeli-

hood to be the true trajectory of o. Thereby, each trajectory with

a non-zero probability is called a possible world of o. Assuming

that objects are mutually independent, our semantics comply with

the classic possible worlds model [6]. If t ∈ o.T

obs

(i.e., the exact

location of o at time t has been observed), then o(t) corresponds to

a (trivial) random variable having one possible location (i.e., state)

with probability 1.

The main challenge in answering a probabilistic spatio-temporal

query is to correctly consider the possible worlds semantics in the

model. In other words, the query results should comply with the

possible worlds model. Naively, this can be done by evaluating the

query predicate on each possible world, and summing up the prob-

abilities of possible worlds satisfying the query predicate. In gen-

eral, the number of possible worlds to be considered is O(|S|

);

exhaustively examining all of them requires exponential time, even

for ﬁnite time and space domains. Clearly, any naive approach that

enumerates all possible worlds, is not feasible.

In this work, we model the uncertain movement of an object

within a diamond, using a ﬁrst-order Markov-chain model. This

approach models the movement between successive points in time,

based on background knowledge (e.g., physical laws) and has proven

capable of effectively capturing the behavior of real objects in prac-

tice. For instance, [1] and [10] show how Markov-models can ef-

fectively capture the movement of vehicles on road networks for

prediction purpose. In [18], it is shown that Markov-models can

also be used to model the indoor movement of people, as tracked

in RFID applications.

DEFINITION 1. A stochastic process o(t), t ∈ T , is called a

Markov-Chain if and only if

∀t ∈ N

∀s

, s

t−1

, ...s

∈ S :

P (o(t + 1) = s

|o(t) = s

, o(t − 1) = s

t−1

, ..., o(0) = s

) =

P (o(t + 1) = s

|o(t) = s

where the conditional probability

o.P

i,j

(t) := P (o(t + 1) = s

|o(t) = s

)

is the (single-step) transition probability of object o from state s

to state s

at time t. The matrix o.M(t) := (o.P

i,j

)

i,j

is called

transition matrix. Let o.P (t) = (p

, . . . , p

) be the distribution

vector of an object o at time t, where p

corresponds to the proba-

bility that o is located at state s

at time t. The distribution vector

o.P (t + 1) can be inferred from o.P (t) as follows:

o.P (t + 1) = o.P (t) · o.M(t)

Queries: Within the scope of this paper, we focus on selection

queries speciﬁed by the following parameters: (i) a spatial window

⊆ S, (ii) a contiguous time window T

⊆ T , and (iii) a prob-

ability threshold τ . In the remainder, we use Q

= S

× T

denote the search space of a query. The most intuitive deﬁnition of

a probabilistic spatio-temporal query is given below:

DEFINITION 2. [Probabilistic Spatio-Temporal τ Exists Query]

Given a query window S

in space and a query window T

time, a probabilistic τ spatio-temporal exists query (PSTτ ∃Q), re-

trieves all objects o ∈ D such that P (∃t ∈ T

: o(t) ∈ S

) ≥ τ;

i.e., the trajectory of o intersects the query window Q

with prob-

ability at least τ.

For example, consider the trajectory of Figure 1 and assume that

we only know for certain its observed locations at {t

, t

};

a PSTτ∃Q query deﬁned by rectangle Q

would return the de-

picted trajectory, only if the probability that the trajectory inter-

sects S

at any time within T

exceeds τ. Another query type is

the Probabilistic Spatio-Temporal τ ForAll Query ([8]), denoted as

(PSTτ∀Q), which requires an object to remain in the spatial win-

dow S

for the whole time window T

. Due to space constraints,

we will not discuss this query in this work, but note that our tech-

niques proposed for PSTτ∃Q queries can easily be adapted.

By modeling the movement within a diamond using a Markov-

chain model, the true probability that an object o satisﬁes a PSTτ∃

query, can be computed in PTIME [8], exploiting that the matrix

M is generally sparse (only a few states are directly connected to

a single state). Still, query evaluation remains too expensive over

a large spatio-temporal database, where we have to compute the

qualiﬁcation probabilities of all objects. In view of this, we deﬁne

a set of approximations of uncertain object trajectories enabling

spatio-temporal and probabilistic ﬁltering in Section 3. We then

show in Secion 4 how we can organize these approximations in an

index in order to perform efﬁcient query evaluation.

3. APPROXIMATING UST-OBJECTS

In this section, we introduce (conservative) spatio-temporal as

well as probabilistic (conservative) spatio-temporal (UST-) object

approximations, which serve as building blocks for our proposed

index, to be presented in Section 4.

3.1 Spatio-Temporal Approximation

To bound the possible locations of an object o between two sub-

sequent observations (o(t

), o(t

)), we need to determine all state-

time pairs (s, t) ∈ S × T, t

≤ t ≤ t

such that o has a non-zero

probability of being at state s at time t. This is done by consider-

ing all possible paths between state o(t

) at time t

and state o(t

)

at time t

. An example of a small set of such paths is depicted

in Figure 2(a). Here, we can see a set of ﬁve possible trajecto-

ries of an object o, i.e., all possible (state, time) pairs of o in

the time interval [t

, t

]. In practice, the number of possible paths

becomes very large. Nonetheless, we can efﬁciently compute the

set of possible (state, time) pairs using the Markov-chain model:

The set of state-time pairs S

reachable from o(t

) can be com-

puted by performing t

− t

transitions using the Markov chain

o.M(t) of o, starting from state o(t

) and memorizing all reachable

(state, time) pairs. Similarly, we can compute S

as all state-time

pairs (s, t) ∈ S × T , t

≤ t ≤ t

such that o can reach state o(t

)

at time t

by starting from state s at time t. S

can be computed in

a similar fashion, starting from state o(t

) and using the transposed

Markov chain o.M(t)

. The intersection S

i,j

= S

∩ S

yields all

state-time pairs which are consistent with both observations. Let us

note that in practice, it is more efﬁcient to compute S

and S

in a

parallel fashion, to reduce the explored space. When the computa-

tion of S

and S

meet at some time t

≤ t ≤ t

, we can prune any

states which are not reachable by both s(t

) at time t

and s(t

) at

time t

. However, the number |S

i,j

| of possible state-time pairs in

i,j

can grow very large, so it is impractical for our index structure

(proposed in Section 4) to store all S

i,j

for each o ∈ DB in our

index structure. Thus, we propose to conservatively approximate

i,j

. The issue is to determine an appropriate approximation of

i,j

which tightly covers S

i,j

, while keeping the representation as

simple as possible. The basic idea is to build the approximation by

means of both object observations o(t

) and o(t

) with the corre-

sponding velocity of propagation in each dimension. To do so, we

ﬁrst compute for the set of state-time pairs S

to derive the maxi-

mum and minimum possible velocity in the time interval [t

, t

:= max

(s,t)∈S

(

s[d] − o(t

)[d])

t − t

)

:= min

(s,t)∈S

(

s[d] − o(t

)[d])

t − t

)

o(t

)

(a) Approximating trajectories

(

)

(

o,t

)

(

)

(

)

(b) Diamond vs MBR

Figure 2: Spatio-Temporal Approximation.

where s[d] (o(t

)[d]) denotes the projection of state s (o(t

)) to

the d-th dimension. By deﬁnition, we can guarantee that for any

≤ t ≤ t

it holds that

o(t)[d] ≤ o(t

)[d] + (t − t

) · v

and

o(t)[d] ≥ o(t

)[d] + (t − t

) · v

Furthermore, we bound the velocity of propagation at which o

can have reached state o(s

) at time t

from each location in the

state-space S

:= max

(s,t)∈S

(

o(t

)[d] − s[d]

− t

)

:= min

(s,t)∈S

(

o(t

)[d] − s[d]

− t

)

Again, we can bound the position of o in dimension 1 ≤ d ≤ D

at time t

≤ t ≤ t

as follows:

o(t)[d] ≤ o(t

)[d] − (t

− t) · v

, and

o(t)[d] ≥ o(t

)[d] − (t

− t) · v

In summary, using the positions o(t

) at time t

and o(t

) at time

, and using velocities v

, v

, we can bound the random

variable of the position o(t) of o at time t

≤ t ≤ t

by the interval

o(t)[d] ∈ I

(t) := [max(o(t

)[d]+(t−t

)·v

, o(t

)[d]−(t

−t)·v

)),

min(o(t

)[d] + (t − t

) · v

, o(t

)[d] − (t

− t) · v

)] (1)

Deriving these intervals for each dimension, yields an axis-parallel

rectangle, approximating all possible positions of o at time t. In the

following, we will call this time dependent spatial approximation of

o(t) in the time interval [t

, t

] between two observations o(t

) and

o(t

) a spatio-temporal diamond, denoted as (o, t

, t

). A nice ge-

ometric property of this approximation is that computing the inter-

section with the query window at each time t is very fast. Another

advantage is that existing spatial access methods (e.g., R-trees) can

be easily used to efﬁciently organize these approximations. To

store the approximation, we only need to store the 4 · D real val-

ues v

, v

, 1 ≤ d ≤ D. A diamond is reminiscent to a

time-parameterized rectangle, used to model the worst-case MBR

for a set of moving objects in [19]; however, the way of deriving

velocities is different in our case. As an example, Figure 2(a) shows

for one dimension d ∈ D, positions o(t

) at time t

and o(t

) at

time t

. The diamond formed by the velocity bounds v

, v

and v

conservatively approximates the possible (location, time)

pairs. Note that it is possible to use a minimal bounding rect-

angle 2(o, t

, t

) instead of the diamond (o, t

, t

) to conserva-

tively approximate the (location, time) space S

i,j

. In cases, how-

ever, where the movement of an object in one dimension is biased

cov,d

int,d

(a) query intersection

(b) dimension-wise intersections

Figure 3: Intersection between query and diamond

in one direction, a rectangle may yield a very bad approximation

(see Figure 2(b) for an example). Our index employs both ap-

proximations 2(o, t

, t

) and (o, t

, t

) for spatio-temporal prun-

ing; 2(o, t

, t

) is used for high-level indexing and ﬁltering, while

(o, t

, t

) is used as a second-level ﬁlter.

3.2 Spatio-Temporal Filter

Based on the spatio-temporal approximation of an uncertain ob-

ject as described in the previous section, it is possible to perform

ﬁltering during query processing.

If none of the diamonds assigned to an object o ∈ D intersects

the query window, then o is safely pruned. In turn, if a diamond

of o is inside the query window S

in space, i.e. fully covered by

, at any point of time t ∈ T

, then o is a true hit and, thus, o

can be immediately reported as result of the query. In order to em-

ploy the above spatio-temporal pruning conditions, for a diamond

(o, t

, t

) of an object o we need to determine the points of time

when it intersect the query window S

in space, as well as the

points of time when (o, t

, t

) is fully covered by S

. For this

purpose, it is helpful to focus on the spatial domain S and interpret

a diamond as well as the query as a time-parameterized (moving)

rectangle. By doing so, we can adapt the techniques proposed in

[19]: In general, a rectangle R

intersects (covers) another rectan-

gle R

, if and only if R

intersects (covers) R

in each dimension.

Thus, for each spatial dimension d (d ∈ {1, . . . D}), we compute

the points of time when the extents of the rectangles intersect in that

dimension and the points of time when the extents of the diamond

rectangle are fully covered by the query rectangle S

For a single dimension d, with Equation 1 the query window

given by Q

intersects the diamond given by o(t

)[d], o(t

)[d],

, v

and v

within the points of time

int,d

:= {t ∈ (T

∩ [t

, t

]|I

(t) ∩ S

Similarly, Q

fully covers the diamond within the points of time

cov,d

:= {t ∈ T

∩ [t

, t

]|I

(t) ⊆ S

An example is illustrated in Figure 3(a). To compute both sets

int,d

and I

cov,d

, we intersect the margins of the diamond with

the query window resulting in a set of time intervals, which subse-

quently have to be intersected accordingly in order to derive I

int,d

and I

cov,d

. Now, let us consider the overall intersection time in-

terval I

int

d=1

int,d

(e.g., see Figure 3(b)) and the overall

points of covering time I

cov

d=1

cov,d

If, for an object o ∈ D, there is no diamond yielding a non-

empty set I

int

, o can be safely pruned. If any diamond of o yields

a non-empty set I

cov

, o can be reported as result.

In summary, the spatio-temporal ﬁlter can be used to identify

uncertain object trajectories having a probability of 100% or 0%

intersecting (remaining in) the query region Q

. Still, the proba-

bility threshold τ of the query is not considered by this ﬁlter. In

addition, the object approximation may cover a lot of dead space

if there exist outlier state-time pairs which determine one or more

of the velocities, despite having a very low probability. In the fol-

lowing, we show how to exclude such unlikely outliers in order to

shrink the approximation, while maintaining probabilistic guaran-

tees that employ the probability threshold τ.

3.3 Probabilistic UST-Object Approximation

We now propose a tighter approximation, based on the intuition

that the set of possible paths within a diamond is generally not uni-

formly distributed: paths that are close to the direct connection be-

tween the observed locations often are more likely than extreme

paths along the edges of the diamond. Therefore, given a query

with threshold τ, we can take advantage of a tighter approximation,

which bounds all paths with cumulative probabilities τ to perform

more effective pruning.

Based on this idea, we exploit the Markov-chain model in order

to compute new diamonds, which are spatio-temporal subregions,

called subdiamonds, of the (full) diamond (o, t

, t

), as depicted

in Figure 4(a). For each such subdiamond, we will then show how

to compute the cumulative probability of all possible trajectories of

o passing only through this subdiamond. Let us focus on restricting

the diamond at one direction of one dimension; we choose one di-

mension d ∈ D, and one direction dir ∈ {∧, ∨}. Direction ∧ (∨)

corresponds to the two diamond sides v

and v

). To

obtain the subdiamond, we scale the corresponding sides by a factor

λ ∈ [0, 1] relative to the average velocity v

avg

o(t

)[d]−o(t

)[d]

−t

We obtain the adjusted velocity values for direction ∧ as follows:

= ((v

− v

avg

) · λ) + v

avg

= v

· λ + v

avg

· (1 − λ)

and

= ((v

− v

avg

) · λ) + v

avg

= v

· λ + v

avg

· (1 − λ)

The adjusted velocity values for direction ∨ can be computed anal-

ogously. Thus, for a given diamond (o, t

, t

), dimension d ∈ D,

direction dir ∈ {∧, ∨} and scalar λ ∈ [0, 1], we obtain a smaller

diamond (o, t

, t

, d, dir, λ), derived from (o, t

, t

) by scaling

direction dir in dimension d by a factor of λ. Figure 4(b) illus-

trates some subdiamonds for one dimension, the ∧ direction and

for various values of λ.

To use such subdiamonds for probabilistic pruning, we ﬁrst need

to compute the probability P (inside(o, (o, t

, t

, d, dir, λ))) that

object o will remain within (o, t

, t

, d, dir, λ) for the whole time

interval [t

, t

], in a correct and efﬁcient way. The main challenge

for correctness, is to cope with temporal dependencies, i.e. the fact

that the random variables o(t

) and o(t

+ δt) are highly corre-

lated. Thus, we cannot simply treat all random variables o(t) as

mutually independent and aggregate their individual distributions.

To illustrate this issue, consider Figure 4(a), where one subdia-

mond is depicted. Assume that each of the ﬁve possible trajec-

tories has a probability of 0.2. We can see that three trajectories are

completely contained in the subdiamond, so that the probability

P (inside(o, (o, t

, t

, d, dir, λ))) that o fully remains in the sub-

diamond (o, t

, t

, d, dir, λ) is 60%. However, multiplying for all

time instants t ∈ [t

, t

] the individual probabilities that o is lo-

cated in (o, t

, t

, d, dir, λ) at time t produces an arbitrarily small

and incorrect result, as time dependencies are ignored. Further-

(a) A sub-diamond

(b) Possible sub-diamonds

opt

(d) Computing λ

opt

f(λ

opt

)

(e) conservative approx.

Figure 4: Probabilistic Diamonds

more, due to the generally exponential number of possible trajecto-

ries, P (inside(o, (o, t

, t

, d, dir, λ))) is too expensive to com-

pute by iterating over all possible trajectories. Instead, we compute

this probability efﬁciently and correctly, as follows.

To compute the probability of possible trajectories between o(t

)

at t

and o(t

) at t

that are completely contained in (o, t

, t

, d, dir, λ),

an intuitive approach is to start at o(t

) at time t

and perform t

−t

transitions using the Markov-chain o.M (t). After each transition,

we identify states that are outside (o, t

, t

, d, dir, λ). Any possi-

ble trajectory which reaches such a state is ﬂagged. Upon reaching

, we only need to consider possible trajectories in state o(t

since all other worlds have become impossible due to the observa-

tion of o at t

. The fraction of un-ﬂagged worlds at state o(t

) at

time t

yields the probability that o does not completely remain in

(o, t

, t

, d, dir, λ).

To formalize the above approach, we ﬁrst rewrite the probability

P (inside(o, )|o(t

), o(t

))) that the trajectory of o remains in the

subdiamond ,

given the observations o(t

), o(t

) at times t

, t

∈

o.T

obs

, using the deﬁnition of conditional probability:

P (inside(o, )|o(t

), o(t

)) =

P (o(t

)|inside(o, ), o(t

))

P (o

)

where P (o(t

)|inside(o, ), o(t

)) denotes the probability that o

reaches the state o(t

) observed by observation o(t

), given that o,

starting at o(t

) at time t

remains inside . P (o(t

)|o(t

)) denotes

the probability that state o(t

) at time t

is reached, given that o

starts at o(t

) at time t

, regardless whether o remains in .

3.4 Finding the optimal Probabilistic Diamond

In the previous section, we described, how to compute the prob-

ability of a probabilistic diamond (o, t

, t

, d, dir, λ) from a dia-

mond (o, t

, t

), dimension d, direction dir, and scaling factor λ.

In this section we will show how to ﬁnd, for a given query window

and a given query predicate the subdiamond with the highest

pruning power. Let us focus on PSTτ∃ queries ﬁrst. That is, our

aim is to ﬁnd a value for d, dir and λ, such that the resulting subdi-

amond (o, t

, t

, d, dir, λ) does not intersect Q

, and at the same

time it has a high probability P (inside(o, )). This probability

can be used to prune o as we will show later. Formally, we want

to efﬁciently determine

argmax

d∈D,dir∈{∨,∧},λ∈[0,1]

[P (inside(o, (o, t

, t

, d, dir, λ))]

constrained to Q

∩ (o, t

, t

, d, dir, λ) = ∅.

For a single dimension d, and the north direction, a possible situ-

ation is depicted in Figure 4(d). Here, the projection 

(o, t

, t

) of

the full diamond (o, t

, t

) to the d-th dimension and the projec-

tions Q

[d] and Q

[d] of two query windows Q

and Q

are de-

Since the context is clear, we simply use  to denote

(o, t

, t

, d, dir, λ).

picted. The aim is to ﬁnd the largest values λ

opt

of λ, such that the

corresponding probabilistic diamond (o, t

, t

, d, ∧, λ

∃

opt

) which

we call optimal subdiamond, does not intersect Q

). To solve

this problem, we distinguish between the following cases.

Case 1: the direct line between observations (o(t

), t

) and (o(t

), t

)

in dimension d intersects Q

[d]. In this case, there cannot exist any

λ ∈ [0, 1] such that Q

∩ (o, t

, t

, d, dir, λ) = ∅. Therefore, our

problem has no solution in dimension d, and d is ignored.

Case 2: the direct line between (o(t

), t

) and (o(t

), t

) does

not intersect Q

[d], and we assume without loss of generality that

[d] is located above this line.

In addition, in this case, the

time value of the north corner c of 

(o, t

, t

) is located in the

interval T

(e.g., see Q

in Figure 4(d)).

In this case, the edge

opt

of the optimal subdiamond (o, t

, t

, d, dim, λ

∃

opt

) is given

by (o(t

), t

) and (s, t) where s corresponds to the lower bound of

[d] and t equals to the time component of c.

Case 3: Q

is above the direct line between (o(t

), t

) and (o(t

), t

)

(as in Case 2), but the time value of the north corner c of 

(o, t

, t

)

is not located in the time interval T

(e.g. Q

of Figure 4(d)). In

this case, the optimal subdiamond must touch a corner of Q

[d]

due to convexity of both Q

[d] and any diamond. If Q

[d] is lo-

cated to the left of c (the right direction is handled symmetrically),

then the edge v

opt

of the optimal subdiamond is given by the line

between (o(t

), t

) and the lower right corner of Q

[d] (e.g., see

Figure 4(d)).

The optimal value λ

∃

opt

for cases 2 and 3 equals the quotient

opt

−v

avg

−v

avg

, i.e., the fraction of the maximum velocity of the opti-

mal subdiamond and the maximum velocity of the full diamond,

both normalized by the average velocity v

avg

s(t

)[d]−s(t

)[d]

−t

After identifying the value for λ

∃

opt

, for a dimension d and a di-

rection dir, we can compute the probability of the corresponding

subdiamond (o, t

, t

, d, dir, λ

∃

opt

). Since we can guarantee, that

any path in this subdiamond does not intersect the query window,

we can obtain a lower bound

(never(o, t

, t

, Q

))=P (inside(o, (o, t

, t

, d, dir, λ

∃

opt

)))

(2)

of the event that o never intersects the query window in the time

interval [t

, t

]. This directly yields an upper bound

(sometimes(o, t

, t

, Q

)) =

1 − P (inside(o, (o, t

, t

, d, dir, λ

∃

opt

))) (3)

of the probability that the reverse event that o intersects the query

If Q

[d] is below the line, we consider direction dir = ∨ sym-

metrically.

Corner c is given by the intersection of lines (o(t

), t

) + v

and

(o(t

), t

) + v

HTML Viewer

Frequently Asked Questions (17)

Q1. What contributions have the authors mentioned in the paper "Indexing uncertain spatio-temporal data" ?

In this work, the authors propose novel approximation techniques in order to probabilistically bound the uncertain movement of objects ; these techniques allow for efficient and effective filtering during query evaluation using an hierarchical index structure. To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics. The authors experimentally show that it accelerates the existing, scan-based approach by orders of magnitude.

Q2. What are the technological enabling factors for such applications?

The technological enabling factors for such applications are advances in sensing and communication/networking, along with the miniaturizations of the computing devices and development of embedded systems.

Q3. What is the common way to indicate whether an object intersects a given window?

Thus quantifiers such as “always”, “sometimes”, “definitely” and “possibly” are used to indicate whether an object intersects a given spatio-temporal query window.

Q4. What data sets were used for the observations of taxis?

For the observations of the real dataset the authors used the GPS data of taxis, where the time between two signals is between 2 and 20 minutes.

Q5. How can the authors compute the set of possible state-time pairs of o?

The set of state-time pairs Si reachable from o(ti) can be computed by performing tj − ti transitions using the Markov chain o.M(t) of o, starting from state o(ti) and memorizing all reachable (state, time) pairs.

Q6. What is the common approach to bound the possible positions of an object at each point of time?

The prevalent approach is to bound the possible positions of an object at each point of time by simple a spatial structure resulting in a spatio-temporal approximation.

Q7. How many times did the query windows in each dimension be set to 0.1?

The spatial extent of the query windows in each dimension was set to 0.1 and the duration of the queries was set to 10 time steps by default.

Q8. Why is the author granting permission to make digital or hard copies of this work?

In addition, to reduce the communication cost, improve the bandwidth utilization, andPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.

Q9. What are the observations needed to build a database?

Additionally to the positions of the states and the transition matrix, the authors further need observations from each object in order to build a database.

Q10. What is the number of possible worlds to be considered?

In general, the number of possible worlds to be considered is O(|S|T ); exhaustively examining all of them requires exponential time, even for finite time and space domains.

Q11. How can the authors see the cost of the probabilistic filter?

The authors can also observe that the probabilistic filter can reduce the number of refinements by 30% after applying the sequence of spatio-temporal filters.

Q12. How can the authors efficiently process the common query types?

The authors showed how the most common query types (spatio-temporal ∃- and ∀-window queries) can be efficiently processed using probabilistic bounds which are computed during index construction.

Q13. How many connections were generated for the possible movements of an object in the space?

Those connections correspond to the possible movements of an object in the space and the authors randomly assigned probabilities to each connection such that the sum of all outgoing edges sums up to 1.

Q14. How can the construction of probabilistic diamonds be performed in parallel?

In a streaming scenario with several updates/insertions per second and large probabilistic diamonds (due to high speed of objects or large intervals between observations), the construction of probabilistic diamonds can be performed in parallel and is therefore still feasible.

Q15. How do the authors avoid the computations at run-time?

To avoid these computations at run-time, the authors propose to precompute, for each diamond (o, ti, tj) in D, probabilistic subdiamonds for each dimension and direction and for a set Λ of λvalues.

Q16. What is the bounded distance between two exact observations?

In particular, assuming a maximum speed in each direction of each dimension, the possible locations that an object can visit between two exact observations is bounded.

Q17. What is the approach to evaluate a large uncertain database?

To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics.

Indexing uncertain spatio-temporal data

Summary (5 min read)

1. INTRODUCTION

2. PRELIMINARIES

3.1 Spatio-Temporal Approximation

3.2 Spatio-Temporal Filter

3.3 Probabilistic UST-Object Approximation

3.5 Approximating Probabilistic Diamonds

3.6 Probabilistic Filter

4. THE UST-TREE

4.1 Architecture

4.2 Query Evaluation

5. EXPERIMENTAL EVALUATION

5.1 UST-tree Construction

5.2 Query Performance

6. RELATED WORK

7. CONCLUSIONS

Figures (9)

Citations

Cites background from "Indexing uncertain spatio-temporal ..."

Cites methods from "Indexing uncertain spatio-temporal ..."

Cites methods from "Indexing uncertain spatio-temporal ..."

References

"Indexing uncertain spatio-temporal ..." refers methods in this paper

"Indexing uncertain spatio-temporal ..." refers background in this paper

"Indexing uncertain spatio-temporal ..." refers background in this paper

"Indexing uncertain spatio-temporal ..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (17)

Q1. What contributions have the authors mentioned in the paper "Indexing uncertain spatio-temporal data" ?

Q2. What are the technological enabling factors for such applications?

Q3. What is the common way to indicate whether an object intersects a given window?

Q4. What data sets were used for the observations of taxis?

Q5. How can the authors compute the set of possible state-time pairs of o?

Q6. What is the common approach to bound the possible positions of an object at each point of time?

Q7. How many times did the query windows in each dimension be set to 0.1?

Q8. Why is the author granting permission to make digital or hard copies of this work?

Q9. What are the observations needed to build a database?

Q10. What is the number of possible worlds to be considered?

Q11. How can the authors see the cost of the probabilistic filter?

Q12. How can the authors efficiently process the common query types?

Q13. How many connections were generated for the possible movements of an object in the space?

Q14. How can the construction of probabilistic diamonds be performed in parallel?

Q15. How do the authors avoid the computations at run-time?

Q16. What is the bounded distance between two exact observations?

Q17. What is the approach to evaluate a large uncertain database?