A novel hyperstring based descriptor for an improved representation of motion trajectory and retrieval of similar video shots with static camera
01 Nov 2012-pp 174-177
TL;DR: A framework has been proposed for representing the trajectory of a moving object, using a novel hyperstring based approach for efficient retrieval of video shots, which unifies both the structural and kinematic features for an improved representation of the trajectory.
Abstract: A framework has been proposed for representing the trajectory of a moving object, using a novel hyperstring based approach for efficient retrieval of video shots. The hyperstring based model unifies both the structural and kinematic features for an improved representation of the trajectory. A Constraint-driven Adjacency Graph matching (CAGM) algorithm has been proposed to measure the similarity between a pair of query and model hyperstrings. Experiments have been performed on benchmark datasets of trajectories (one synthetic and three real-world video shots), to assess the performance (using Precision-Recall metric) of the proposed model. Results have been compared with two similar published works on video retrieval using trajectories, to demonstrate the superiority of our proposed framework.
Citations
More filters
[...]
TL;DR: This paper focuses on representing the dynamics of geometric features on the Spatio-Temporal Volume (STV) created from a real world video shot, and captures the geometric property of the parameterized STV using the Gaussian curvature computed at each point on its surface.
Abstract: In this paper, we address the problem of Content Based Video Retrieval using a multivariate time series modeling of features. We particularly focus on representing the dynamics of geometric features on the Spatio-Temporal Volume (STV) created from a real world video shot. The STV intrinsically holds the video content by capturing the dynamics of the appearance of the foreground object over time, and hence can be considered as a dynamical system. We have captured the geometric property of the parameterized STV using the Gaussian curvature computed at each point on its surface. The change of Gaussian curvature over time is then modeled as a Linear Dynamical System (LDS). Due to its capability to efficiently model the dynamics of a multivariate signal, Auto Regressive Moving Average (ARMA) model is used to represent the time series data. Parameters of the ARMA model are then used for video content representation. To discriminate between a pair of video shots (time series), we have used the subspace angle between a pair of feature vectors formed using ARMA model parameters. Experiments are done on four publicly available benchmark datasets, shot using a static camera. We present both qualitative and quantitative analysis of our proposed framework. Comparative results with three recent works on video retrieval also show the efficiency of our proposed framework.
9 citations
Cites methods from "A novel hyperstring based descripto..."
[...]
[...]
TL;DR: This paper presents generalized spatiotemporal analysis and lookup tool (GESTALT), an unsupervised framework for content-based video retrieval that takes a query video and retrieves “similar” videos from the database.
Abstract: This paper presents generalized spatiotemporal analysis and lookup tool (GESTALT), an unsupervised framework for content-based video retrieval. GESTALT takes a query video and retrieves “similar” videos from the database. Motion and dynamics of appearance (shape) patterns of a prominent moving foreground object are considered as the key components of the video content and captured using corresponding feature descriptors. GESTALT automatically segments the moving foreground object from the given query video shot and estimates the motion trajectory. A graph-based framework is used to explicitly capture the structural and kinematics property of the motion trajectory, while an improved version of an existing spatiotemporal feature descriptor is proposed to model the change in object shape and movement over time. A combined match cost is computed as a convex combination of the two match scores, using these two feature descriptors, which is used to rank-order the retrieved video shots. Effectiveness of GESTALT is shown using extensive experimentation, and comparative study with recent techniques exhibits its superiority.
7 citations
Cites methods from "A novel hyperstring based descripto..."
[...]
[...]
[...]
[...]
[...]
[...]
TL;DR: VIDCAR, an unsupervised framework for Content-Based Video Retrieval (CBVR) using representation of the dynamics in the spatio-temporal model extracted from video shots, was shown to have greater precision recall than the competitors on five datasets.
Abstract: This paper presents VIDeo Content Analysis and Retrieval (VIDCAR), an unsupervised framework for Content-Based Video Retrieval (CBVR) using representation of the dynamics in the spatio-temporal model extracted from video shots. We propose Dynamic Multi Spectro Temporal-Curvature Scale Space (DMST-CSS), an improved feature descriptor for enhancing the performance of CBVR task. Our primary contribution is in representation of the dynamics of the evolution of the MST-CSS surface. Unlike the earlier MST-CSS descriptor [22], which extracts geometric features after the evolving MST-CSS surface converges to a final formation, this DMST-CSS captures the dynamics of the evolution (formation) of the surface and is thus more robust. We have represented the dynamics of MST-CSS surface as a multivariate time series to obtain a DMST-CSS descriptor. A global kernel alignment technique has been adapted to compute a match cost between query and model DMST-CSS descriptor. In our experiments, VIDCAR was shown to have greater precision recall than the competitors on five datasets.
3 citations
Cites methods from "A novel hyperstring based descripto..."
[...]
References
More filters
[...]
TL;DR: This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.
Abstract: Local space-time features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper, we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition.
3,051 citations
[...]
TL;DR: This work introduces a novel descriptor based on motion boundary histograms, which is robust to camera motion and consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos.
Abstract: Feature trajectories have shown to be efficient for representing videos. Typically, they are extracted using the KLT tracker or matching SIFT descriptors between frames. However, the quality as well as quantity of these trajectories is often not sufficient. Inspired by the recent success of dense sampling in image classification, we propose an approach to describe videos by dense trajectories. We sample dense points from each frame and track them based on displacement information from a dense optical flow field. Given a state-of-the-art optical flow algorithm, our trajectories are robust to fast irregular motions as well as shot boundaries. Additionally, dense trajectories cover the motion information in videos well. We, also, investigate how to design descriptors to encode the trajectory information. We introduce a novel descriptor based on motion boundary histograms, which is robust to camera motion. This descriptor consistently outperforms other state-of-the-art descriptors, in particular in uncontrolled realistic videos. We evaluate our video description in the context of action classification with a bag-of-features approach. Experimental results show a significant improvement over the state of the art on four datasets of varying difficulty, i.e. KTH, YouTube, Hollywood2 and UCF sports.
2,320 citations
"A novel hyperstring based descripto..." refers methods in this paper
[...]
[...]
TL;DR: The method is fast, does not require video alignment and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video.
Abstract: Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach by Gorelick et al. (2004) for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure and orientation. We show that these features are useful for action recognition, detection and clustering. The method is fast, does not require video alignment and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, non-rigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action and low quality video
2,077 citations
[...]
TL;DR: The method is fast, does not require video alignment, and is applicable in many scenarios where the background is known, and the robustness of the method is demonstrated to partial occlusions, nonrigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action, and low-quality video.
Abstract: Human action in video sequences can be seen as silhouettes of a moving torso and protruding limbs undergoing articulated motion. We regard human actions as three-dimensional shapes induced by the silhouettes in the space-time volume. We adopt a recent approach [14] for analyzing 2D shapes and generalize it to deal with volumetric space-time action shapes. Our method utilizes properties of the solution to the Poisson equation to extract space-time features such as local space-time saliency, action dynamics, shape structure, and orientation. We show that these features are useful for action recognition, detection, and clustering. The method is fast, does not require video alignment, and is applicable in (but not limited to) many scenarios where the background is known. Moreover, we demonstrate the robustness of our method to partial occlusions, nonrigid deformations, significant changes in scale and viewpoint, high irregularities in the performance of an action, and low-quality video.
1,776 citations
[...]
TL;DR: Efficiency figures show that the proposed technique for motion detection outperforms recent and proven state-of-the-art methods in terms of both computation speed and detection rate.
Abstract: This paper presents a technique for motion detection that incorporates several innovative mechanisms. For example, our proposed technique stores, for each pixel, a set of values taken in the past at the same location or in the neighborhood. It then compares this set to the current pixel value in order to determine whether that pixel belongs to the background, and adapts the model by choosing randomly which values to substitute from the background model. This approach differs from those based upon the classical belief that the oldest values should be replaced first. Finally, when the pixel is found to be part of the background, its value is propagated into the background model of a neighboring pixel. We describe our method in full details (including pseudo-code and the parameter values used) and compare it to other background subtraction techniques. Efficiency figures show that our method outperforms recent and proven state-of-the-art methods in terms of both computation speed and detection rate. We also analyze the performance of a downscaled version of our algorithm to the absolute minimum of one comparison and one byte of memory per pixel. It appears that even such a simplified version of our algorithm performs better than mainstream techniques.
1,563 citations
Related Papers (5)
[...]
[...]
[...]