scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Enhancing the MST-CSS Representation Using Robust Geometric Features, for Efficient Content Based Video Retrieval (CBVR)

TL;DR: Comparative study with the existing MST-CSS representation and two state-of-the-art methods for CBVR shows enhanced performance on one synthetic and two real-world datasets.
Abstract: Multi-Spectro-Temporal Curvature Scale Space (MST-CSS) had been proposed as a video content descriptor in an earlier work, where the peak and saddle points were used for feature points. But these are inadequate to capture the salient features of the MST-CSS surface, producing poor retrieval results. To overcome these, we propose EMST-CSS (Enhanced MST-CSS) as a better feature representation with an improved matching method for CBVR (Content Based Video Retrieval). Comparative study with the existing MST-CSS representation and two state-of-the-art methods for CBVR shows enhanced performance on one synthetic and two real-world datasets.
Citations
More filters
Journal ArticleDOI
TL;DR: The existing challenges in the STIP detection for video, such as low time efficiency, poor robustness with respect to camera movement, illumination change, perspective occlusion, and background clutter are summarized.
Abstract: Recently, increasing attention has been paid to the detection of spatio-temporal interest points (STIPs), which has become a key technique and research focus in the field of computer vision. Its applications include human action recognition, video surveillance, video summarization, and content-based video retrieval. Amount of work has been done by many researchers in STIP detection. This paper presents a comprehensive review on STIP detection algorithms. We first propose the detailed introductions and analysis of the existing STIP detection algorithms. STIP detection algorithms are robust in detecting interest points for video in the spatio-temporal domain. Next, we summarize the existing challenges in the STIP detection for video, such as low time efficiency, poor robustness with respect to camera movement, illumination change, perspective occlusion, and background clutter. This paper also presents the application situations of STIP and discusses the potential development trends of STIP detection.

41 citations


Cites background from "Enhancing the MST-CSS Representatio..."

  • ...Moreover, Enhanced Multi-Spectro-Temporal Curvature Scale Space (EMSTCSS), a novel feature representation, was proposed in paper [28]....

    [...]

Proceedings ArticleDOI
01 Dec 2013
TL;DR: Performance of STAR has been evaluated qualitatively and quantitatively using precision-recall metric on benchmark video datasets having unconstrained video shots, to exhibit efficiency of STAR.
Abstract: This paper presents the design of STAR (Spatio-Temporal Analysis and Retrieval), an unsupervised Content Based Video Retrieval (CBVR) System. STAR's key insight and primary contribution is that it models video content using a joint spatio-temporal feature representation and retrieves videos from the database which have similar moving object and trajectories of motion. Foreground moving blobs from a moving camera video shot are extracted, along with a trajectory for camera motion compensation, to form the space-time volume (STV). The STV is processed to obtain the EMST-CSS representation, which can discriminate across different categories of videos. Performance of STAR has been evaluated qualitatively and quantitatively using precision-recall metric on benchmark video datasets having unconstrained video shots, to exhibit efficiency of STAR.

9 citations


Cites methods from "Enhancing the MST-CSS Representatio..."

  • ...The method proposed in [14], works under the constraint that videos were shot using a static camera....

    [...]

  • ...V....

    [...]

  • ...We have extended the EMST-CSS [14] representation for moving camera videos....

    [...]

  • ...The interclass dissimilarity in the structure of the EMST-CSS surface is evident and exploited for CBVR. Geometric properties (Mean and Gaussian curvature) are used to detect peaks and ridges on the EMST-CSS surface and stored as content descriptors for representing the video shot....

    [...]

  • ...The work reported in [14] uses features extracted from an Extended Multi Spectro Temporal-Curvature Scale Space (EMST-CSS) surface, as a video content descriptor for CBVR task....

    [...]

Journal ArticleDOI
TL;DR: This paper focuses on representing the dynamics of geometric features on the Spatio-Temporal Volume (STV) created from a real world video shot, and captures the geometric property of the parameterized STV using the Gaussian curvature computed at each point on its surface.
Abstract: In this paper, we address the problem of Content Based Video Retrieval using a multivariate time series modeling of features. We particularly focus on representing the dynamics of geometric features on the Spatio-Temporal Volume (STV) created from a real world video shot. The STV intrinsically holds the video content by capturing the dynamics of the appearance of the foreground object over time, and hence can be considered as a dynamical system. We have captured the geometric property of the parameterized STV using the Gaussian curvature computed at each point on its surface. The change of Gaussian curvature over time is then modeled as a Linear Dynamical System (LDS). Due to its capability to efficiently model the dynamics of a multivariate signal, Auto Regressive Moving Average (ARMA) model is used to represent the time series data. Parameters of the ARMA model are then used for video content representation. To discriminate between a pair of video shots (time series), we have used the subspace angle between a pair of feature vectors formed using ARMA model parameters. Experiments are done on four publicly available benchmark datasets, shot using a static camera. We present both qualitative and quantitative analysis of our proposed framework. Comparative results with three recent works on video retrieval also show the efficiency of our proposed framework.

9 citations


Cites methods or result from "Enhancing the MST-CSS Representatio..."

  • ...ods of CBVR [10,21,25], using spatio-temporal approach....

    [...]

  • ...al [25], b Gao and Yang [21], c EMST-CSS [10], and d our proposed method....

    [...]

  • ...Figure 13 depicts precision-recall graphs for experiments on four different benchmark datasets, comparing our results with [21,25] and [10]....

    [...]

  • ...In one of our earlier work [10], a joint spatio-temporal representation of a video object has been analyzed, where shape and trajectory features are combined to generate an STV....

    [...]

  • ...Since ARMA models allows for general dynamic relationships among variables in a system, it is more adequate to represent the dynamics of the STV and thus provides more accurate retrieval results than [10,21,25]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents generalized spatiotemporal analysis and lookup tool (GESTALT), an unsupervised framework for content-based video retrieval that takes a query video and retrieves “similar” videos from the database.
Abstract: This paper presents generalized spatiotemporal analysis and lookup tool (GESTALT), an unsupervised framework for content-based video retrieval. GESTALT takes a query video and retrieves “similar” videos from the database. Motion and dynamics of appearance (shape) patterns of a prominent moving foreground object are considered as the key components of the video content and captured using corresponding feature descriptors. GESTALT automatically segments the moving foreground object from the given query video shot and estimates the motion trajectory. A graph-based framework is used to explicitly capture the structural and kinematics property of the motion trajectory, while an improved version of an existing spatiotemporal feature descriptor is proposed to model the change in object shape and movement over time. A combined match cost is computed as a convex combination of the two match scores, using these two feature descriptors, which is used to rank-order the retrieved video shots. Effectiveness of GESTALT is shown using extensive experimentation, and comparative study with recent techniques exhibits its superiority.

7 citations


Cites methods from "Enhancing the MST-CSS Representatio..."

  • ...After each iteration, zero-crossing contours are detected on this evolving STV surface as features to represent the video object and stacked in a 3D (u, v, σ ) space yielding the Multi-SpectroTemporal Curvature Scale Space (MST-CSS) surface [26]....

    [...]

Journal ArticleDOI
TL;DR: VIDCAR, an unsupervised framework for Content-Based Video Retrieval (CBVR) using representation of the dynamics in the spatio-temporal model extracted from video shots, was shown to have greater precision recall than the competitors on five datasets.
Abstract: This paper presents VIDeo Content Analysis and Retrieval (VIDCAR), an unsupervised framework for Content-Based Video Retrieval (CBVR) using representation of the dynamics in the spatio-temporal model extracted from video shots. We propose Dynamic Multi Spectro Temporal-Curvature Scale Space (DMST-CSS), an improved feature descriptor for enhancing the performance of CBVR task. Our primary contribution is in representation of the dynamics of the evolution of the MST-CSS surface. Unlike the earlier MST-CSS descriptor [22], which extracts geometric features after the evolving MST-CSS surface converges to a final formation, this DMST-CSS captures the dynamics of the evolution (formation) of the surface and is thus more robust. We have represented the dynamics of MST-CSS surface as a multivariate time series to obtain a DMST-CSS descriptor. A global kernel alignment technique has been adapted to compute a match cost between query and model DMST-CSS descriptor. In our experiments, VIDCAR was shown to have greater precision recall than the competitors on five datasets.

3 citations


Cites background or methods from "Enhancing the MST-CSS Representatio..."

  • ...(final formation) of the evolving MST-CSS surface, while ignoring the dynamics of evolution [12,22]....

    [...]

  • ...In all the previous methods published earlier for MST-CSS [22] representation or its variants [9,12], features have been extracted from the final form of the MST-CSS surface....

    [...]

  • ...This is formed by stacking a sequence of zero-crossing contours (ZCC) detected on the evolving STV [12,22] surface....

    [...]

  • ...An extension of [12] was proposed in [9] to make the CBVR method applicable for moving camera video (MCV) shots....

    [...]

  • ...In one of our earlier work [12], a joint spatio-temporal representation of a video object has been analyzed....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is shown how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space, which makes the generalized Houghtransform a kind of universal transform which can beused to find arbitrarily complex shapes.

4,310 citations


"Enhancing the MST-CSS Representatio..." refers background or methods in this paper

  • ...(GHT) [19] based registration, (ii) Iterative Closest Points (ICP) [20] and (iii) Coherent Point Drift (CPD) [21]....

    [...]

  • ...For registering two (query and gallery) MST-CSS surfaces we have adapted and compared three different techniques: (i) Generalized Hough Transform (GHT) [19] based registration, (ii) Iterative Closest Points (ICP) [20] and (iii) Coherent Point Drift (CPD) [21]....

    [...]

Proceedings ArticleDOI
01 May 2001
TL;DR: An implementation is demonstrated that is able to align two range images in a few tens of milliseconds, assuming a good initial guess, and has potential application to real-time 3D model acquisition and model-based tracking.
Abstract: The ICP (Iterative Closest Point) algorithm is widely used for geometric alignment of three-dimensional models when an initial estimate of the relative pose is known. Many variants of ICP have been proposed, affecting all phases of the algorithm from the selection and matching of points to the minimization strategy. We enumerate and classify many of these variants, and evaluate their effect on the speed with which the correct alignment is reached. In order to improve convergence for nearly-flat meshes with small features, such as inscribed surfaces, we introduce a new variant based on uniform sampling of the space of normals. We conclude by proposing a combination of ICP variants optimized for high speed. We demonstrate an implementation that is able to align two range images in a few tens of milliseconds, assuming a good initial guess. This capability has potential application to real-time 3D model acquisition and model-based tracking.

4,059 citations


"Enhancing the MST-CSS Representatio..." refers background or methods in this paper

  • ...(GHT) [19] based registration, (ii) Iterative Closest Points (ICP) [20] and (iii) Coherent Point Drift (CPD) [21]....

    [...]

  • ...For registering two (query and gallery) MST-CSS surfaces we have adapted and compared three different techniques: (i) Generalized Hough Transform (GHT) [19] based registration, (ii) Iterative Closest Points (ICP) [20] and (iii) Coherent Point Drift (CPD) [21]....

    [...]

Proceedings ArticleDOI
23 Aug 2004
TL;DR: This paper construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition and presents the presented results of action recognition.
Abstract: Local space-time features capture local events in video and can be adapted to the size, the frequency and the velocity of moving patterns. In this paper, we demonstrate how such features can be used for recognizing complex motion patterns. We construct video representations in terms of local space-time features and integrate such representations with SVM classification schemes for recognition. For the purpose of evaluation we introduce a new video database containing 2391 sequences of six human actions performed by 25 people in four different scenarios. The presented results of action recognition justify the proposed method and demonstrate its advantage compared to other relative approaches for action recognition.

3,238 citations

Journal ArticleDOI
TL;DR: A view-based approach to the representation and recognition of human movement is presented, and a recognition method matching temporal templates against stored instances of views of known actions is developed.
Abstract: A view-based approach to the representation and recognition of human movement is presented. The basis of the representation is a temporal template-a static vector-image where the vector value at each point is a function of the motion properties at the corresponding spatial location in an image sequence. Using aerobics exercises as a test domain, we explore the representational power of a simple, two component version of the templates: The first value is a binary value indicating the presence of motion and the second value is a function of the recency of motion in a sequence. We then develop a recognition method matching temporal templates against stored instances of views of known actions. The method automatically performs temporal segmentation, is invariant to linear changes in speed, and runs in real-time on standard platforms.

2,932 citations

Journal ArticleDOI
TL;DR: A probabilistic method, called the Coherent Point Drift (CPD) algorithm, is introduced for both rigid and nonrigid point set registration and a fast algorithm is introduced that reduces the method computation complexity to linear.
Abstract: Point set registration is a key component in many computer vision tasks. The goal of point set registration is to assign correspondences between two sets of points and to recover the transformation that maps one point set to the other. Multiple factors, including an unknown nonrigid spatial transformation, large dimensionality of point set, noise, and outliers, make the point set registration a challenging problem. We introduce a probabilistic method, called the Coherent Point Drift (CPD) algorithm, for both rigid and nonrigid point set registration. We consider the alignment of two point sets as a probability density estimation problem. We fit the Gaussian mixture model (GMM) centroids (representing the first point set) to the data (the second point set) by maximizing the likelihood. We force the GMM centroids to move coherently as a group to preserve the topological structure of the point sets. In the rigid case, we impose the coherence constraint by reparameterization of GMM centroid locations with rigid parameters and derive a closed form solution of the maximization step of the EM algorithm in arbitrary dimensions. In the nonrigid case, we impose the coherence constraint by regularizing the displacement field and using the variational calculus to derive the optimal transformation. We also introduce a fast algorithm that reduces the method computation complexity to linear. We test the CPD algorithm for both rigid and nonrigid transformations in the presence of noise, outliers, and missing points, where CPD shows accurate results and outperforms current state-of-the-art methods.

2,429 citations


"Enhancing the MST-CSS Representatio..." refers background or methods in this paper

  • ...(GHT) [19] based registration, (ii) Iterative Closest Points (ICP) [20] and (iii) Coherent Point Drift (CPD) [21]....

    [...]

  • ...For registering two (query and gallery) MST-CSS surfaces we have adapted and compared three different techniques: (i) Generalized Hough Transform (GHT) [19] based registration, (ii) Iterative Closest Points (ICP) [20] and (iii) Coherent Point Drift (CPD) [21]....

    [...]

  • ...We have adapted CPD [21] as it outperforms (verified experimentally) the other two techniques....

    [...]