scispace - formally typeset
Search or ask a question

Showing papers in "Computer Vision and Image Understanding in 2012"


Journal ArticleDOI
TL;DR: An automatic, video-based analysis of the events in Duisburg is presented and methods for the detection and early warning of dangerous situations during mass events are proposed.

170 citations


Journal ArticleDOI
TL;DR: A new integrated framework that addresses the problems of thermal-visible video registration, sensor fusion, and people tracking for far-range videos is proposed, which demonstrates the advantage of the proposed framework in obtaining better results for both image registration and tracking than separate imageRegistration and tracking methods.

154 citations


Journal ArticleDOI
TL;DR: An approach for anomaly detection and localization, in video surveillance applications, based on spatio-temporal features that capture scene dynamic statistics together with appearance is proposed, and outperforms other state-of-the-art real-time approaches.

149 citations


Journal ArticleDOI
TL;DR: An efficient combination of algorithms for the automated localization of the optic disc and macula in retinal fundus images by combining the prediction of multiple algorithms benefiting from their strength and compensating their weaknesses is proposed.

142 citations


Journal ArticleDOI
TL;DR: This paper presents a novel approach for robust and selective STIP detection, by applying surround suppression combined with local and temporal constraints, and introduces a novel vocabulary building strategy by combining spatial pyramid and vocabulary compression techniques, resulting in improved performance and efficiency.

141 citations


Journal ArticleDOI
TL;DR: Experiments showed that the proposed statistical approach to visual texture description outperforms existing static texture classification methods and is comparable to the top dynamic texture classification techniques.

115 citations


Journal ArticleDOI
TL;DR: A new vision based framework for driver foot behavior analysis is proposed using optical flow based foot tracking and a Hidden Markov Model (HMM) based technique to characterize the temporal foot behavior.

109 citations


Journal ArticleDOI
TL;DR: This paper reviews the existing methods designed to calibrate any central omnivision system and analyzes their advantages and drawbacks doing a deep comparison using simulated and real data.

80 citations


Journal ArticleDOI
TL;DR: In this article, a discriminant multiple coupled latent subspace framework is proposed to find the sets of projection directions for different poses such that the projected images of the same subject in different poses are maximally correlated in the latent space.

80 citations


Journal ArticleDOI
TL;DR: The discriminant movement representation combined with camera viewpoint identification and a nearest centroid classification step leads to a high human movement classification accuracy.

79 citations


Journal ArticleDOI
TL;DR: By exploiting contextual information, the proposed system is able to make more accurate detections, especially of those behaviours which are only suspicious in some contexts while being normal in the others, and gives critical feedback to the system designers to refine the system.

Journal ArticleDOI
TL;DR: Five core optimization constraints which are used by 13 methods together with different optimization techniques are identified and part of the 13 methods are combined with techniques for robust estimation like m-functions or RANSAC in order to achieve an improvement of estimates for noisy visual motion fields.

Journal ArticleDOI
TL;DR: A novel online framework for behavior understanding, in visual workflows, capable of achieving high recognition rates in real-time, using a Bayesian filter supported by hidden Markov models and a novel re-adjustment framework of behavior recognition and classification.

Journal ArticleDOI
TL;DR: This paper presents a principled approach to learning a semantic vocabulary from a large amount of video words using Diffusion Maps embedding, and conjecture that the mid-level features produced by similar video sources must lie on a certain manifold.

Journal ArticleDOI
TL;DR: This work presents a graph matching method to solve the point-set correspondence problem, which is posed as one of mixture modelling, and uses a true continuous underlying correspondence variable.

Journal ArticleDOI
TL;DR: 3D articulated tracking avoids the need for view-based models, specific camera viewpoints, and constrained domains and provides a natural benchmark for evaluating the performance of 3D pose tracking methods (vs. conventional Euclidean joint error metrics).

Journal ArticleDOI
TL;DR: This paper presents an integrated solution for the problem of detecting, tracking and identifying vehicles in a tunnel surveillance application, taking into account practical constraints including real-time operation, poor imaging conditions, and a decentralized architecture.

Journal ArticleDOI
TL;DR: This paper attacks the key problem of camera pose estimation, in an automatic and efficient way, by matching vanishing points with 3D directions derived from a 3D range model, and utilizing low-level linear features.

Journal ArticleDOI
TL;DR: The proposed approach was applied for the segmentation of internal brain structures in magnetic resonance images and shows the relevance of the optimization criteria and the interest of the backtracking procedure to guarantee good and consistent results.

Journal ArticleDOI
TL;DR: A novel moving object detection algorithm is proposed for which an illumination change model, a chromaticity difference model and a brightness ratio model are developed that estimates the intensity difference and intensity ratio of false foreground pixels, respectively.

Journal ArticleDOI
TL;DR: The proposed tracker enhances the recently suggested FragTrack algorithm to employ an adaptive cue integration scheme by embedding the original tracker into a particle filter framework, associating a reliability value to each fragment that describes a different part of the target object and dynamically adjusting these reliabilities at each frame with respect to the current context.

Journal ArticleDOI
TL;DR: An accurate and fast approach for MR-image segmentation of brain tissues, that is robust to anatomical variations and takes an average of less than 1min for completion on modern PCs is presented.

Journal ArticleDOI
TL;DR: This article defines the so-called bio-inspired features associated to an input video, based on the average activity of MT cells, and shows how these features can be used in a standard classification method to perform action recognition.

Journal ArticleDOI
TL;DR: This work uses multiple relatively-shifted LR range images, where the motion between the LR images serves as a cue for super-resolution, and exploits a cue from segmentation of an optical image of the same scene, which constrains pixels in the same color segment to have similar range values.

Journal ArticleDOI
TL;DR: MMTrack (max-margin tracker), a single-target tracker that linearly combines constant and adaptive appearance features, is introduced and a system combining a variety of appearance features and a motion model is demonstrated, with the parameters of these features learned jointly in a coherent learning framework.

Journal ArticleDOI
TL;DR: The minimal levels of linear correlation between the outputs produced by the proposed strategy and other state-of-the-art techniques suggest that the fusion of both recognition techniques significantly improve performance, which is regarded as a positive step towards the development of extremely ambitious types of biometric recognition.

Journal ArticleDOI
TL;DR: Experimental validation on data from two different datasets, illustrates the significant biometric authentication potential of the proposed framework in realistic scenarios, whereby the user is unobtrusively observed, while the use of the static anthropometric profile is seen to significantly improve performance with respect to state-of-the-art approaches.

Journal ArticleDOI
TL;DR: In this paper, a set of composed complex-cue image descriptors is introduced and evaluated with respect to the problems of recognizing previously seen object instances from previously unseen views, and classifying previously unseen objects into visual categories.

Journal ArticleDOI
TL;DR: A generic framework in which images are modelled as order-less sets of weighted visual features, each visual feature is associated with a weight factor that may inform its relevance, and it is suggested that if dense sampling is used, different schemes to weight local features can be evaluated, leading to results that are often better than the combination of multiple sampling schemes.

Journal ArticleDOI
TL;DR: An efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario is presented.