Indexing uncertain spatio-temporal data
Summary (5 min read)
1. INTRODUCTION
- Efficient management of large collections of spatio-temporal data pertaining to mobile entities whose locations change over time is paramount in a large variety of application domains: military applications, structural and environmental monitoring, disaster/rescue management and remediation, Geographic Information Systems (GIS), Location-Based Services (LBS).
- The technological enabling factors for such applications are advances in sensing and communication/networking, along with the miniaturizations of the computing devices and development of embedded systems.
- Movements between observations are usually modeled by beads the probability of an object to be located outside a bead at each point of time = 0 cope with storage constraints, often the recorded object trajectories undergo simplification, eliminating some recorded values.
- Having object trajectories sampled only at discrete time-instants and/or simplified, renders the movement in-between samples uncertain and query evaluation challenging.
2. PRELIMINARIES
- This section formally defines the type of spatio-temporal data that the authors index, the stochastic model that they use for uncertain trajectories derived from the data, and the query types that they handle.
- To overcome this problem, in their proposal each uncertain spatio-temporal (UST) object o ∈ D is associated with an uncertain object trajectory, which is represented by a stochastic process.
- Clearly, any naive approach that enumerates all possible worlds, is not feasible.
- Another query type is the Probabilistic Spatio-Temporal τ ForAll Query ([8]), denoted as (PSTτ∀Q), which requires an object to remain in the spatial window S2 for the whole time window T 2.
- By modeling the movement within a diamond using a Markovchain model, the true probability that an object o satisfies a PSTτ∃ query, can be computed in PTIME [8], exploiting that the matrix M is generally sparse (only a few states are directly connected to a single state).
3.1 Spatio-Temporal Approximation
- This is done by considering all possible paths between state o(ti) at time ti and state o(tj) at time tj .
- The set of state-time pairs Si reachable from o(ti) can be computed by performing tj − ti transitions using the Markov chain o.M(t) of o, starting from state o(ti) and memorizing all reachable (state, time) pairs.
- Thus, the authors propose to conservatively approximate Si,j .
- In the following, the authors will call this time dependent spatial approximation of o(t) in the time interval [ti, tj ] between two observations o(ti) and o(tj) a spatio-temporal diamond, denoted as (o, ti, tj).
- A diamond is reminiscent to a time-parameterized rectangle, used to model the worst-case MBR for a set of moving objects in [19]; however, the way of deriving velocities is different in their case.
3.2 Spatio-Temporal Filter
- Based on the spatio-temporal approximation of an uncertain object as described in the previous section, it is possible to perform filtering during query processing.
- By doing so, the authors can adapt the techniques proposed in [19]: D}), they compute the points of time when the extents of the rectangles intersect in that dimension and the points of time when the extents of the diamond rectangle are fully covered by the query rectangle S2.
- In summary, the spatio-temporal filter can be used to identify uncertain object trajectories having a probability of 100% or 0% intersecting (remaining in) the query region Q2.
- Still, the proba- bility threshold τ of the query is not considered by this filter.
3.3 Probabilistic UST-Object Approximation
- The authors now propose a tighter approximation, based on the intuition that the set of possible paths within a diamond is generally not uniformly distributed: paths that are close to the direct connection between the observed locations often are more likely than extreme paths along the edges of the diamond.
- To obtain the subdiamond, the authors scale the corresponding sides by a factor λ ∈ [0, 1] relative to the average velocity vavgd = o(tj)[d]−o(ti)[d] tj−ti .
- The main challenge for correctness, is to cope with temporal dependencies, i.e. the fact that the random variables o(ti) and o(ti + δt) are highly correlated.
- To compute the probability of possible trajectories between o(ti) at ti and o(tj) at tj that are completely contained in (o, ti, tj , d, dir, λ), an intuitive approach is to start at o(ti) at time ti and perform tj−ti transitions using the Markov-chain o.M(t).
- Any possible trajectory which reaches such a state is flagged.
3.5 Approximating Probabilistic Diamonds
- The main goal of their index structure, proposed in Section 4, is to avoid expensive probability computations for subdiamonds.
- Since the query window is not known in advance, 2D computations (i.e., one for each dimension and direction) have to performed in order to identify the optimal subdiamond for a given query and candidate object o.
- Thus, the authors need to conservatively approximate the probability of probabilistic diamonds (o, ti, tj , d, dir, λ) for which λ /∈ Λ. We propose to use a conservative linear approximation ofP (inside(o, (o, ti, tj , d, dir, λ))) which increases monotonically with λ, using the precomputed probability values.
- The latter constraint is required to maintain the conservativeness property of the approximation, which will be required for pruning.
- The authors model this as a linear programming problem: find a linear function l(λ) = a · λ+ b that minimizes the aggregate error with respect to the sample points, under the constraint that the approximation line does not exceed any of the sample values (e.g., the line in Figure 4(c)).
3.6 Probabilistic Filter
- Using this line directly, may violate the conservativeness property, since the true function may have any monotonic increasing form, and thus, for a value λQ located in between two values λ1 and λ2 (λ1, λ2 ∈ Λ, λ1 < λQ < λ2) the probability is bounded by P (λ1) ≤ P (λQ) ≤ P (λ2).
- In their running example, the function f(λ) is depicted in Figure 4(e).
- In turn, when probing an uncertain trajectory approximation (o, ti, tj) on the query rangeQ2, the authors only have to take the time range [ti, tj ] into account; i.e., if the time range T 2 of the query spans beyond [ti, tj ], they truncate T 2 accordingly.
- This observation allows us to compute the probability P ∃(o) that the whole chain of diamonds of o intersects a query windowQ2.
- Thus, the exact probability P (sometimes(o, ti, tj ,Q2)) is computed using the technique proposed in [8].
4. THE UST-TREE
- In the previous section, the authors showed that they can precompute a set of approximations for each object, which can be progressively used to prune an object during query evaluation.
- The authors introduce the UST-tree, which is an R-tree-based hierarchical index structure, designed to organize the object approximations and efficiently prune objects that may not possibly qualify the query; for the remaining objects the query is directly verified based on their Markov models, as described in [8] (refinement step).
- Section 4.1 describes the structure of the UST-tree and Section 4.2 presents a generic query processing algorithm for answering PSTτ∃ queries.
4.1 Architecture
- The UST-tree index is a hierarchical disk-based index.
- The basic structure is illustrated in Figure 5.
- Intermediate node entries of the UST-tree have exactly the same structure as in an R-tree; i.e., each entry contains a pointer referencing its child node and the MBR of all MBR approximations stored in pointed subtree.
- Note that the necklace of each object is decomposed into diamonds, which are stored independently in the leaf nodes of the tree.
- Since the directory structure of the UST-tree is identical to that of the R-tree, the UST-tree uses the same methods as the R∗-tree [2] to handle updates.
4.2 Query Evaluation
- Given a spatio-temporal query windowQ2, the UST-tree is hierarchically traversed starting from the root, recursively visiting entries whose MBRs intersectQ2; i.e., the subtree of an intermediate entry e is pruned if e.mbr ∩ Q2 = ∅.
- If this filter fails, the authors proceed using (o, ti, tj) (ST-Diamond filter) by performing intersection tests against Q2 as described in Section 3.2.
- Note that sometimes multiple leaf entries associated with an object are required to prune an object or confirm whether it is a true Index Entries at Leaf Level: hit.
- For each object o which is not pruned (or reported as true hit), the authors accumulate in a list L(o) all upper bounds of its qualification probabilities from the leaf entries that index the diamonds of o.
- After collecting all candidate objects, the qualification probabilities stored in the list L(o) for each candidate o are aggregated in order to derive the upper bound of the overall qualification probability P (∃t ∈ T 2 : o(t) ∈ S2) for o, as described in Section 3.6.
5. EXPERIMENTAL EVALUATION
- In order to evaluate the proposed techniques the authors used data derived from a real application and several synthetic data sets.
- As a basis for the real world data served the trajectory data set containing one-week trajectories of 10,357 taxis in Beijing from [27].
- Additionally to the positions of the states and the transition matrix, the authors further need observations from each object in order to build a database.
- The time steps between two successive observations was randomly chosen from the interval [10,15] if not stated otherwise.
- The authors experiments assess the construction cost of the UST-tree structure and its performance on query evaluation.
5.1 UST-tree Construction
- The first experiment investigates the cost of index construction.
- Still, this cost is reasonable, since the construction of a probabilistic diamond is comparable to the construction of 2 ·D · |Λ| subdiamonds, which in turn corresponds to one refinement step (considering the subdiamond as a query window).
- The results reflect the theoretical considerations showing a quadratic runtime behavior with respect to both parameters.
- Theoretically this parameter should have a linear impact on the construction time.
- Is also excluded from larger subdiamonds in the same dimension and direction.
5.2 Query Performance
- In the first set of query performance experiments, the authors compare the cost of using UST-tree with two competitors on synthetic data (see Figure 7).
- This is attributed to the effectiveness of the different filter steps used by the USTtree; the overhead of the UST-tree filter is negligible compared to the savings in refinement cost.
- The percentage of the diamonds which can be pruned using the probabilistic filter remains rather constant (at around 30%) in comparison to the spatio- temporal filter.
- Increasing the length of the query time window T 2 increases the number of refinement candidates.
- This shows, that the probabilistic filter copes better with more uncertainty in the data than the other two filters.
7. CONCLUSIONS
- The authors proposed the UST-tree which is an index structure for uncertain spatio-temporal data.
- The UST-tree adopts and incorporates state-of-the art techniques from several fields of research in order to cope with the complexity of the data.
- To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics.
- Outside the scope of this work is the con- sideration of an object’s location before its first and after its last observation.
- In both cases, the resulting diamond approximation would be unbounded.
Did you find this useful? Give us your feedback
Citations
61 citations
43 citations
Cites background from "Indexing uncertain spatio-temporal ..."
...There is a large amount of research using and extending the concept of spacetime prisms for moving object trajectory databases [Kuijpers and Othman 2009, 2010; Emrich et al. 2012]....
[...]
40 citations
Cites methods from "Indexing uncertain spatio-temporal ..."
...In this case, current researches mainly falls into several classes: hash index [32], tree structure index [33, 34], time-led composite index [35, 36], index dynamically adjusted with data migration [37, 38], and index optimized with parallel processing [39]....
[...]
21 citations
Cites methods from "Indexing uncertain spatio-temporal ..."
...Using the technique of uncertain generating functions [27], [45], a correct, effective and efficient solution to approximate kNNs will be presented....
[...]
17 citations
References
4,686 citations
"Indexing uncertain spatio-temporal ..." refers methods in this paper
...Although the R*-Tree has lower filter cost, the overall query performance of the UST-tree is around 3 times better than that of the R*-Tree (note the logarithmic scale)....
[...]
...The UST-tree has higher filtering cost, since the representation of the probabilistic diamonds requires more space and the tree is larger than the R*-Tree, which only stores MBR approximations but incurs much higher I/O cost for refinements....
[...]
...These MBRs are then indexed using a conventional R*-Tree [2]....
[...]
...Another advantage is that existing spatial access methods (e.g., R-trees) can be easily used to efficiently organize these approximations....
[...]
...The R*-Tree competitor approximates all possible locations (i.e. state-time pairs) between two successive observations of an object using only (o, ti, tj)....
[...]
1,211 citations
"Indexing uncertain spatio-temporal ..." refers background in this paper
..., [1, 4, 10, 18], where Markov Chains were proved successful in modeling spatio-temporal data....
[...]
...For instance, [1] and [10] show how Markov-models can effectively capture the movement of vehicles on road networks for prediction purpose....
[...]
1,113 citations
"Indexing uncertain spatio-temporal ..." refers background in this paper
...Assuming that objects are mutually independent, our semantics comply with the classic possible worlds model [6]....
[...]
880 citations
758 citations
"Indexing uncertain spatio-temporal ..." refers background in this paper
...As a basis for the real world data served the trajectory data set containing one-week trajectories of 10,357 taxis in Beijing from [27]....
[...]
Related Papers (5)
Frequently Asked Questions (17)
Q2. What are the technological enabling factors for such applications?
The technological enabling factors for such applications are advances in sensing and communication/networking, along with the miniaturizations of the computing devices and development of embedded systems.
Q3. What is the common way to indicate whether an object intersects a given window?
Thus quantifiers such as “always”, “sometimes”, “definitely” and “possibly” are used to indicate whether an object intersects a given spatio-temporal query window.
Q4. What data sets were used for the observations of taxis?
For the observations of the real dataset the authors used the GPS data of taxis, where the time between two signals is between 2 and 20 minutes.
Q5. How can the authors compute the set of possible state-time pairs of o?
The set of state-time pairs Si reachable from o(ti) can be computed by performing tj − ti transitions using the Markov chain o.M(t) of o, starting from state o(ti) and memorizing all reachable (state, time) pairs.
Q6. What is the common approach to bound the possible positions of an object at each point of time?
The prevalent approach is to bound the possible positions of an object at each point of time by simple a spatial structure resulting in a spatio-temporal approximation.
Q7. How many times did the query windows in each dimension be set to 0.1?
The spatial extent of the query windows in each dimension was set to 0.1 and the duration of the queries was set to 10 time steps by default.
Q8. Why is the author granting permission to make digital or hard copies of this work?
In addition, to reduce the communication cost, improve the bandwidth utilization, andPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Q9. What are the observations needed to build a database?
Additionally to the positions of the states and the transition matrix, the authors further need observations from each object in order to build a database.
Q10. What is the number of possible worlds to be considered?
In general, the number of possible worlds to be considered is O(|S|T ); exhaustively examining all of them requires exponential time, even for finite time and space domains.
Q11. How can the authors see the cost of the probabilistic filter?
The authors can also observe that the probabilistic filter can reduce the number of refinements by 30% after applying the sequence of spatio-temporal filters.
Q12. How can the authors efficiently process the common query types?
The authors showed how the most common query types (spatio-temporal ∃- and ∀-window queries) can be efficiently processed using probabilistic bounds which are computed during index construction.
Q13. How many connections were generated for the possible movements of an object in the space?
Those connections correspond to the possible movements of an object in the space and the authors randomly assigned probabilities to each connection such that the sum of all outgoing edges sums up to 1.
Q14. How can the construction of probabilistic diamonds be performed in parallel?
In a streaming scenario with several updates/insertions per second and large probabilistic diamonds (due to high speed of objects or large intervals between observations), the construction of probabilistic diamonds can be performed in parallel and is therefore still feasible.
Q15. How do the authors avoid the computations at run-time?
To avoid these computations at run-time, the authors propose to precompute, for each diamond (o, ti, tj) in D, probabilistic subdiamonds for each dimension and direction and for a set Λ of λvalues.
Q16. What is the bounded distance between two exact observations?
In particular, assuming a maximum speed in each direction of each dimension, the possible locations that an object can visit between two exact observations is bounded.
Q17. What is the approach to evaluate a large uncertain database?
To the best of their knowledge, this is the first approach that supports query evaluation on very large uncertain spatio-temporal databases, adhering to possible worlds semantics.