Mining frequent spatio-temporal sequential patterns
Summary (3 min read)
1 Introduction
- The movement of an object (i.e., trajectory) can be described by a sequence of spatial locations sampled at consecutive timestamps (e.g., with the use of Global Positioning System (GPS) devices).
- Buses move along series of streets repeatedly, people go to and return from work following more or less the same routes, etc.
- Unfortunately, pattern discovery techniques in transactional databases are not readily applicable for finding sequential patterns in spatio-temporal data.
3.1 Motivation
- Locations are not repeated exactly in every instance of a movement pattern.
- A naive method is to use a regular grid (or some predefined spatial decomposition) to divide the space into regions by taking a user-defined parameter G,an approximate number that each axis will be split to.
- The authors may miss some frequent patterns, whose instances are divided between different grid-based patterns.
- An alternative conversion technique adds the ids of cells that intersect with the line segments connecting consecutive locations to the transformed sequence.
- Motivated by line simplification techniques ([3]), the authors represent segments of the spatio-temporal series by directed line segments.
3.2 Problem definition
- Given sij , the authors define its representative line segment lij with starting point (xi, yi) and ending point (xj , yj).
- Lgh is not close to lij for the point in the right upper part has distance to lij bigger than 5.0.
- Given a support threshold min sup, P is frequent if its support exceeds min sup.
- The parameter values depend on the application domain, or can be tuned as part of the mining process [2].
4.1 Discovering frequent singular patterns
- The segmentation (line simplification) algorithm ([3, 5, 6]) is used to convert the locations series to segments sequences so that each raw sequence segment could be abstracted by a line segment.
- The DP (Douglas-Peucker) algorithm [3] is a classical top down approach for this problem. [6] provides an online algorithm in splitting a sequence to segments with quite good quality.
- It selects the segment s with median length, i.e., the median of the lengths of the segments in Segs, as seed for the initial spatial region r.
- Let lsi be the representative line segment for si.
4.2 Deriving longer patterns
- SR preserves the motion continuity of the object by showing how it moves among regions.
- The concatenation of some regions may not be frequent.
- R1, r2 and r3 are frequently visited, but the path r2r3 is not frequent.
- This section discusses how to detect the longer frequent patterns.
4.2.1 Level-wise mining
- This approach suffers from the disadvantage that SR needs to be scanned many times.
- This constraint can help reduce the number of generated candidates, as follows.
- The authors first construct a connectivity graph for all the spatial regions in SR.
- The edge weight is the frequency that rirj appears in the sequence.
- In addition, assume that result contains only one pattern starting from r3: P ′ = r3r4r6r7.
4.2.2 Mining using the substring tree
- The authors propose a substring tree structure to facilitate counting of long substrings with different elements.
- The substring tree is a rooted directed tree whose root links to multiple substring sub-trees.
- The process continues until the authors see the fifth element r1.
- Each element in the stack comprises of a pattern, its count and a level, indicating whether the pattern has reached a leaf or not.
5 Experiments
- This section evaluates their proposed approach with real and synthetic data.
- The real data contain tracked bus movements in Patras, Greece.
- The generator takes three parameters, |p|, n, and m. |p| is the number of line segments constituting circular paths (i.e., patterns) of the movement.
- The description of the artificial series is given in related experiments.
- For each value in the set, the authors cluster the y coordinates of the sample points and derive dense ranges of y values.
5.2 Effectiveness and efficiency study
- The authors examine the effectiveness of their method taking as input a raw bus movement sequence shown in Figure 5a, which contains 6921 locations.
- This is quite coarse, since the movement inside each cell is unknown.
- Table 1b compares the total time spent by their methods, and the grid methods which use the substring tree for finding longer patterns.
- This happens because many cells in the sequence become outliers for this case, thus Grid II discovers shorter patterns (whereas Grid I finds longer ones, since it does not introduce intermediate cells at a sharp movement).
6 Conclusion
- The authors modeled the problem of mining sequential patterns from spatio-temporal data by considering both spatial and temporal information.
- Singular frequent pat- terns are found effectively, by grouping segments not only by similar shape (like previous work in time-series mining), but also by closeness in space.
- In addition, the authors employed special properties of the problem (spatial connectivity, closeness) and a newly proposed substring tree to accelerate search for longer patterns.
Did you find this useful? Give us your feedback
Citations
1,448 citations
Cites background from "Mining frequent spatio-temporal seq..."
...Such optimization ideas can be extended to mining spatiotemporal sequential patterns as well, as shown in Cao et al. (2005)....
[...]
1,289 citations
Cites background or methods from "Mining frequent spatio-temporal seq..."
...1 Sequential Pattern Mining in a Free Space Line-simplification-based methods: An early solution aiming to deal with the aforementioned issues was proposed in 2005 [11]....
[...]
...Line-Simplification-Based Methods: An early solution aiming to deal with the aforementioned issues was proposed in 2005 [Cao et al. 2005]....
[...]
1,099 citations
Cites background from "Mining frequent spatio-temporal seq..."
...The work in [3] considers patterns that are in the form...
[...]
817 citations
564 citations
References
5,663 citations
3,749 citations
"Mining frequent spatio-temporal seq..." refers methods in this paper
...The DP (Douglas-Peucker) algorithm [3] is a classical top down approach for this problem....
[...]
...Motivated by line simplification techniques ([3]), we repre sent segments of the spatio-temporal series by directed lin segments....
[...]
...The segmentation (line simplification) algorithm ([3, 5, 6]) is used to convert the locations series to segments sequences so that each raw sequence segment could be abstracted by a line segment....
[...]
1,593 citations
"Mining frequent spatio-temporal seq..." refers background in this paper
...A pattern’s instances cannot overlap in time (the pattern may be over-counted like that in [10] otherwise), i....
[...]
...[10] investigated the discovery of frequent episodesfrom event sequences....
[...]
1,193 citations
"Mining frequent spatio-temporal seq..." refers methods in this paper
...[6] provides an online algorithm in splitting a sequence to segments with quite good quality....
[...]
...The segmentation (line simplification) algorithm ([3, 5, 6]) is used to convert the locations series to segments sequences so that each raw sequence segment could be abstracted by a line segment....
[...]
952 citations