scispace - formally typeset
Search or ask a question

Showing papers by "Andrea Cavallaro published in 2010"


Journal ArticleDOI
TL;DR: Video analytics, loosely defined as autonomous understanding of events occurring in a scene monitored by multiple video cameras, has been rapidly evolving in the last two decades but practical surveillance systems deployed today are not yet capable of autonomous analysis of complex events in the field of view of cameras.
Abstract: Video analytics, loosely defined as autonomous understanding of events occurring in a scene monitored by multiple video cameras, has been rapidly evolving in the last two decades. Despite this effort, practical surveillance systems deployed today are not yet capable of autonomous analysis of complex events in the field of view of cameras. This is a serious deficiency as video feeds from millions of surveillance cameras worldwide are not analyzed in real time and thus cannot help with accident, crime or terrorism prevention, and mitigation, issues critical to the contemporary society. Today, these feeds are, at best, recorded to facilitate post-event video forensics.

74 citations


Journal ArticleDOI
TL;DR: Experimental results show that the proposed algorithm provides typically 10 to 15 dB better approximation of perfect Gaussian blurring than the blended Gaussian pyramid blurring approach when using a bank of just eight filters.
Abstract: This paper presents a computationally efficient algorithm for smoothly space-variant Gaussian blurring of images. The proposed algorithm uses a specialized filter bank with optimal filters computed through principal component analysis. This filter bank approximates perfect space-variant Gaussian blurring to arbitrarily high accuracy and at greatly reduced computational cost compared to the brute force approach of employing a separate low-pass filter at each image location. This is particularly important for spatially variant image processing such as foveated coding. Experimental results show that the proposed algorithm provides typically 10 to 15 dB better approximation of perfect Gaussian blurring than the blended Gaussian pyramid blurring approach when using a bank of just eight filters.

46 citations


Journal ArticleDOI
TL;DR: This work uses motion vectors extracted over a region of interest (ROI) as features and a non-linear, graph-based manifold learning algorithm coupled with a supervised novelty classifier to label segments of a video sequence to detect abnormal visual event detection.

40 citations


Journal ArticleDOI
TL;DR: This work presents a content-aware multi-camera selection technique that uses object- and frame-level features and compares the proposed approach with a maximum score based camera selection criterion and demonstrates a significant decrease in camera flickering.
Abstract: We present a content-aware multi-camera selection technique that uses object- and frame-level features. First objects are detected using a color-based change detector. Next trajectory information for each object is generated using multi-frame graph matching. Finally, multiple features including size and location are used to generate an object score. At frame-level, we consider total activity, event score, number of objects and cumulative object score. These features are used to generate score information using a multivariate Gaussian distribution. The algorithm. The best view is selected using a Dynamic Bayesian Network (DBN), which utilizes camera network information. DBN employs previous view information to select the current view thus increasing resilience to frequent switching. The performance of the proposed approach is demonstrated on three multi-camera setups with semi-overlapping fields of view: a basketball game, an indoor airport surveillance scenario and a synthetic outdoor pedestrian dataset. We compare the proposed view selection approach with a maximum score based camera selection criterion and demonstrate a significant decrease in camera flickering. The performance of the proposed approach is also validated through subjective testing.

40 citations


Proceedings ArticleDOI
29 Aug 2010
TL;DR: Experimental resultson a real underground station dataset shows that the linear approach is better suited for cases where the subspacelearning is restricted to the labeled samples, whereas the then-linear approach is preferable in the presence of additionalunlabeled data.
Abstract: On-line abnormality detection in video without the use ofobject detection and tracking is a desirable task in surveillance.We address this problem for the case when labeledinformation about normal events is limited and informationabout abnormal events is not available. We formulatethis problem as a one-class classification, where multiplelocal novelty classifiers (detectors) are used to first learnnormal actions based on motion information and then todetect abnormal instances. Each detector is associated toa small region of interest and is trained over labeled samplesprojected on an appropriate subspace. We discover thissubspace by using both labeled and unlabeled segments.We investigate the use of subspace learning and comparetwo methodologies based on linear (Principal ComponentsAnalysis) and on non-linear subspace learning (LocalityPreserving Projections), respectively. Experimental resultson a real underground station dataset shows that the linearapproach is better suited for cases where the subspacelearning is restricted to the labeled samples, whereas thenon-linear approach is preferable in the presence of additionalunlabeled data.

25 citations


Book ChapterDOI
01 Jan 2010
TL;DR: This chapter presents a multi-view track-before-detect approach that consistently detects and recognizes multiple simultaneous objects in a common view, based on motion models.
Abstract: Multi-viewtrackers combine data fromdifferent camera views to estimate the temporal evolution of objects across a monitored area. Data to be combined can be represented by object features (such as position, color and silhouette) or by object trajectories in each view. In this Chapter, we classify and survey state-of-the art multi-view tracking algorithms and discuss their applications and algorithmic limitations. Moreover, we present a multi-view track-before-detect approach that consistently detects and recognizes multiple simultaneous objects in a common view, based on motion models. This approach estimates the temporal evolution of objects from noisy data, given their motion model, without an explicit object detection stage.

18 citations


Book ChapterDOI
19 Dec 2010

15 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: A taxonomy is proposed and a comparative evaluation of online quality estimators for video object tracking shows that the Observation Likelihood measure is an appropriate quality measure for overall tracking performance evaluation, while the Template Inverse Matching measure is appropriate to detect the start and the end instants of tracking failures.
Abstract: Failure of tracking algorithms is inevitable in real and on-line tracking systems. The online estimation of the track quality is therefore desirable for detecting tracking failures while the algorithm is operating. In this paper, we propose a taxonomy and present a comparative evaluation of online quality estimators for video object tracking. The measures are compared over a heterogeneous video dataset with standard sequences. Among other results, the experiments show, that the Observation Likelihood (OL) measure is an appropriate quality measure for overall tracking performance evaluation, while the Template Inverse Matching (TIM) measure is appropriate to detect the start and the end instants of tracking failures.

13 citations


Book ChapterDOI
01 Jan 2010
TL;DR: A soft partitional algorithm based on non-parametric Mean-shift clustering is presented, validated on real datasets and compared with state-of-the-art approaches, based on objective evaluation metrics.
Abstract: We present a scene understanding strategy for video sequences based on clustering object trajectories. In this chapter, we discuss a set of relevant feature spaces for trajectory representation and we critically analyze their relative merits. Next, we examine various trajectory clustering methods that can be employed to learn activity models, based on their classification into hierarchical and partitional algorithms. In particular, we focus on parametric and non-parametric partitional algorithms and discuss the limitations of existing approaches. To overcome the limitations of state-of-the-art approaches we present a soft partitional algorithm based on non-parametric Mean-shift clustering. The proposed algorithm is validated on real datasets and compared with state-of-the-art approaches, based on objective evaluation metrics.

11 citations


Proceedings ArticleDOI
03 Dec 2010
TL;DR: The technique is based on the analysis of the dynamics in the scene and allows us to overcome the challenges due to frequent occlusions of the ball and its similarity in appearance with the background and can be estimated with an average accuracy of 82%.
Abstract: We present a technique for estimating the location of the ball during a basketball game without using a detector. The technique is based on the analysis of the dynamics in the scene and allows us to overcome the challenges due to frequent occlusions of the ball and its similarity in appearance with the background. Based on the assumption that the ball is the point of focus of the game and that the motion flow of the players is dependent on its position during attack actions, the most probable candidates for the ball location are extracted from each frame. These candidates are then validated over time using a Kalman filter. Experimental results on a real basketball dataset show that the location of the ball can be estimated with an average accuracy of 82%.

9 citations


Book ChapterDOI
01 Jan 2010
TL;DR: This chapter presents an interaction modeling framework formulated as a state sequence estimation problem using time-series analysis, and Bayesian network-based methods and their variants are studied for the analysis of interactions in videos.
Abstract: Detection and tracking algorithms generates useful information in the form of trajectories from which the behaviors and the interactions of moving objects can be inferred through the analysis of spatio-temporal features. Interactions occur either between a dynamic and a static object, or between multiple dynamic objects. This chapter presents an interaction modeling framework formulated as a state sequence estimation problem using time-series analysis. Bayesian network-based methods and their variants are studied for the analysis of interactions in videos. Moreover, techniques such as Coupled Hidden Markov Model are also discussed for more complex interactions, such as those between multiple dynamic objects. Finally, the interaction modeling is demonstrated on real surveillance and sport sequences.

Journal ArticleDOI
TL;DR: This poster presents a probabilistic procedure to characterize the response of the immune system to laser-spot assisted, 3D image analysis of EMT.
Abstract: Reference EPFL-ARTICLE-171979doi:10.1155/2010/560927View record in Web of Science Record created on 2011-12-16, modified on 2017-05-10

Journal ArticleDOI
TL;DR: This Special Issue covers the state-of-art and recent advances in several aspects of multi-sensor detection, tracking, planning and their applications and discusses a distributed solution of the pose estimation problem that is robust to errors due to occlusions.

01 Jan 2010
TL;DR: An event detection approach based on local feature modeling, which is based on spatio-temporal cuboids and perspective normalization (QMUL-ACTIVA 3 / p-baseline 1), is discussed.
Abstract: We discuss an event detection approach based on local feature modeling, which is based on spatio-temporal cuboids and perspective normalization (QMUL-ACTIVA 3 / p-baseline 1). Motion information is compared against examples of events learned from a training dataset to define a similarity measure. This similarity measure is then analyzed in both space and time to identify frames containing instances of the event of interest (a person running in an airport building). Features are analyzed locally to enable the differentiation of simultaneously occurring events in different portions of an image frame. The performance is quantified on the TRECVID 2010 surveillance event detection dataset.

Book ChapterDOI
19 Dec 2010


Book ChapterDOI
19 Dec 2010