Efficiently querying moving objects with pre-defined paths in a distributed environment
Summary (2 min read)
1. INTRODUCTION
- The explosive growth of the Internet has made a wealth of networked information available.
- One solution to reduce the query processing time of moving objects with predefined paths and schedules is to precompute the required information and materialize it using a moving object data model such as the 3D Trajectory [3] model.
- Different sources may contain overlapping information or only fragments of desired data.
- The authors try to overcome the first obstacle by either performing a pre-computation step and then apply a less expensive function/filter, or delaying the spatial filter until they reduce the size of the spatial data (e.g., railroad vector data) significantly.
- Section 5 reports on their experimental observations.
2. PROBLEM DEFINITION
- The director of the movie needs to make sure that no train passes by that location while they are shooting the movie.
- This requires having information about the schedules of the trains running on the nearby railroads.
- The following is the list of the simplified version of the sources available on the web and in their local databases, which can be used to provide necessary information for the director of the movie.
- Hence, the function Ψ constructs all paths between any possible start and end points (complexity O(N2r )) while Φ does the same operation on all possible station pairs (complexity O(N2s )).
- The sources Sstations and Srailroads contain spatial information while Sschedules has temporal content.
3. CENTRALIZED ENVIRONMENT
- The authors integrate the sources Sstations, Sschedules and Srailroads off-line using the following relational algebra expression to generate a 3D-trajectory source.
- One can answer the queries described in Section 2 through an intersect operation on 3D-trajectory source between each train trajectory and (x, y, [ta, tb]), where (x, y) is the query point and [ta, tb] is the given time interval.the authors.
- The relational algebra expression for the intersect operation is given in Equation 2.
- The cost of building the materialized view is usually very high as the sources usually contain large sets of data.
- In addition, the limitations of the mediator server and/or regulations imposed by remote sources may restrict us from materializing the data locally.
4. DISTRIBUTED ENVIRONMENT
- The authors describe four alternative approaches to integrate the sources Sstations, Sschedules and Srailroads in a distributed environment.
- The coordinates of the stations are not exploited to improve the selectivity of the temporal selection predicate.
- The authors propose alternative algorithms to determine the threshold for the path deviation approach.
- The authors use Equation 7 to compute the threshold: PDT = max( length(railroad) Distance(PS , PE) ) (7) This equation computes the maximum ratio between the actual length of the railroad connecting all pairs of the starting and ending points in Srailroads and the euclidian distance between them.
4.2.2 Adaptive Threshold
- The authors propose two approaches to adaptively select the value of the threshold.
- Subsequently, different path deviation predicates with different thresholds are applied to different regions.
- Unlike the ap- proach discussed in Section 4.1, the spatial predicate is performed before the join operation between Sspp and the result of the selection on Sschedules.
- This has the advantage that the spatial expression σQ′ S can be applied earlier to Sspp.
- The authors first perform a spatial selection on Sspp to find all railroad paths that intersect with the given point (x, y).
5.1 Implementation
- The configuration for the experiments consisted of two SUN servers connected through a 100 mpbs LAN.
- Both systems run Informix Universal Server 9.2 with ESRI Spatial datablade.
- Random railroad segments connect each station with its 3 to 16 closest stations to provide different connectivity for stations.
- Different values of start time for the given time interval, ta, were randomly selected.
5.2 Experimental Results
- Figure 4 depicts the results of their experiments for the four alternative query plans discussed in Section 4.
- The Y-axis shows the average query response time in seconds.
- This approach assumes complete control over the remote server hosting Sschedules, because both Sspp and the results of the selection predicate over Sspp must be stored at the remote server.
- This is because of the good selectivity of this filter only for short ranges of time interval.
- Note that although both PTF and TF SJ do not utilize the expensive σQS (because of PTF ’s precomputation step and TF SJ ’s delayed spatial operation), the overhead of the temporal data transfer over the network is so high that cancels out all the low complexity benefits of σQ′ S .
Did you find this useful? Give us your feedback
Citations
71 citations
63 citations
48 citations
Cites background from "Efficiently querying moving objects..."
...urban networks) can be also indexed using the geometrical properties of predefined paths [ 25 ]....
[...]
34 citations
References
668 citations
526 citations
419 citations
377 citations
164 citations
Related Papers (5)
Frequently Asked Questions (19)
Q2. What have the authors stated for future works in "Efficiently querying moving objects with pre-defined paths in a distributed environment" ?
The authors demonstrated that the complexity of the spatial selection operation and bad selectivity of the temporal selection operation render pure filter+semi-join query plans inefficient. The authors plan to extend this study in three ways. Finally, the authors want to incorporate their deviation based query plan into the WorldInfo Assistant [ 1 ] application and show its usefulness for efficient query evaluation.
Q3. Why is the TF SJ query plan used?
due to the bad selectivity of the temporal predicate (specially for longer time intervals), large amount of data need to be transferred over the network that may result in longer query processing time.
Q4. What is the advantage of this method?
The advantage of this method is that the authors can perform a simpler spatial expression, σQ′S , afterSrailroads and intermediate results are joined.
Q5. What is the disadvantage of the second approach?
With the second approach, the authors consider railroad paths with higher values of threshold as exceptions and exclude them from the computation of the overall threshold.
Q6. What is the relational algebra expression of the query?
The relational algebra expression of their query using this query execution plan is:σFR(σQ′ S (σQDF (σQT (Sschedules ./si Sstations))./<xi,yi> Srailroads)) (6)Since the path deviation predicate provides a better selectivity as compared to the temporal selection predicate, this approach results in less data transfer between the servers and better query response time.
Q7. What is the reason why the temporal filter is inefficient?
The authors demonstrated that the complexity of the spatial selection operation and bad selectivity of the temporal selection operation render pure filter+semi-join query plans inefficient.
Q8. What are the main drawbacks of a pure filter+semi-jo?
spatial filters (e.g., identifying all railroad segments that overlap with a given point) are computationally complex resulting in long local or remote query processing time.
Q9. What is the effect of the bad selectivity of the temporal predicate?
In addition, the effect ofthe bad selectivity of the temporal predicate is eliminated by transferring the selected railroads over the network as opposed to the selected schedules.
Q10. What is the disadvantage of a constant value for the threshold?
The only disadvantage of a constant value for the threshold is that a large value of the path deviation for some railroads results in a large value of the overall threshold for the entire network.
Q11. What is the relational algebra expression to perform their query with this approach?
The relational algebra expression to perform their query with this approach is:σFR((σQ′ S Sspp) ./si σQT (Sschedules)) (10)With this approach, the complexity of the spatial predicate is reduced as the authors use σQ′S on Sssp.
Q12. What is the way to perform the deviation based query?
It would be interesting to see how much moving object queries on real data would benefit from adaptive threshold computation of the deviation approach as compared to a single constant threshold.
Q13. How did the authors solve the problem of a spatial filter?
The authors investigated solutions to both problems by first joining each source with a helper spatial source and then 1) performing pre-computation on the combination of two spatial sources, or 2) replacing temporal filters with spatio-temporal ones on the combination of the spatial and the temporal sources.
Q14. What is the shortest railroad path between all possible stations?
Hence Nr À Ns.• QT = Intersect([ta, tb], [ti, tj ]) QT , representing the temporal predicate of the query, finds schedules that have intersection with a given time interval [ta, tb]. • QS = Intersect((x, y), Ψ(Srailroads)) where Ψ(Srailroads) is a function that calculates the shortest1 railroad paths between all possible start and end point pairs (< Xi, Yi >, < Xj , Yj >). • Q′S = Intersect((x, y), Φ(Srailroads, Sstations)) where Φ(Srailroads, Sstations) is a function that calculates the shortest railroad paths between all station pairs (< xi, yi >, < xj , yj >).
Q15. What is the disadvantage of a constant threshold?
In real-world, for a flat area such as a desert, this would result in less conservative thresholds while for an area with several mountains, a more conservative threshold will be chosen.
Q16. What is the way to select the threshold?
This results in better selectivity for the path deviation selection predicate as the high value of the threshold for a particular region does not affect the regions with lower values of the threshold.
Q17. What are the reasons the authors ignored the unrealistic plans?
the authors ignore those plans that are unrealistic due to either large data transmission over the network or expensive local/remote computations.
Q18. What is the difference between the two different types of filters?
since this filter works on the small set of point data (station coordinates) as opposed to the large set of line data (railroad vectors), it is not as computationally complex as the pure spatial filter.
Q19. What is the way to perform a query in a distributed environment?
In a distributed environment (e.g., WWW) with limited access control (i.e., read-only access), creating a new source (i.e., Sspp) in a remote server may not be possible which renders this approach impractical.