scispace - formally typeset
Search or ask a question

Showing papers on "Object detection published in 2001"


Proceedings ArticleDOI
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

18,620 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algo- rithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a "cascade" which allows back- ground regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor- mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

10,592 citations


Proceedings Article
01 Jan 2001
TL;DR: Viola et al. as mentioned in this paper proposed a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates using a new image representation called the integral image, which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient classifiers [4]. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performance comparable to the best previous systems [16, 11, 14, 10, 1]. Implemented on a conventional desktop, face detection proceeds at 15 frames per second. Author email: fPaul.Viola,Mike.J.Jonesg@compaq.com c Compaq Computer Corporation, 2001 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of the Cambridge Research Laboratory of Compaq Computer Corporation in Cambridge, Massachusetts; an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the Cambridge Research Laboratory. All rights reserved. CRL Technical reports are available on the CRL’s web page at http://crl.research.compaq.com. Compaq Computer Corporation Cambridge Research Laboratory One Cambridge Center Cambridge, Massachusetts 02142 USA

1,648 citations


Journal ArticleDOI
TL;DR: Results suggest that the improvement in performance is due to the component-based approach and the ACC data classification architecture, which is capable of locating partially occluded views of people and people whose body parts have little contrast with the background.
Abstract: We present a general example-based framework for detecting objects in static images by components. The technique is demonstrated by developing a system that locates people in cluttered scenes. The system is structured with four distinct example-based detectors that are trained to separately find the four components of the human body: the head, legs, left arm, and right arm. After ensuring that these components are present in the proper geometric configuration, a second example-based classifier combines the results of the component detectors to classify a pattern as either a "person" or a "nonperson." We call this type of hierarchical architecture, in which learning occurs at multiple stages, an adaptive combination of classifiers (ACC). We present results that show that this system performs significantly better than a similar full-body person detector. This suggests that the improvement in performance is due to the component-based approach and the ACC data classification architecture. The algorithm is also more robust than the full-body person detection method in that it is capable of locating partially occluded views of people and people whose body parts have little contrast with the background.

1,115 citations


Journal ArticleDOI
TL;DR: In this article, color edges in an image are first obtained automatically by combining an improved isotropic edge detector and a fast entropic thresholding technique, and the centroids between these adjacent edge regions are taken as the initial seeds for seeded region growing (SRG), these seeds are then replaced by the generated homogeneous image regions by incorporating the required additional pixels step by step.
Abstract: We propose a new automatic image segmentation method. Color edges in an image are first obtained automatically by combining an improved isotropic edge detector and a fast entropic thresholding technique. After the obtained color edges have provided the major geometric structures in an image, the centroids between these adjacent edge regions are taken as the initial seeds for seeded region growing (SRG). These seeds are then replaced by the centroids of the generated homogeneous image regions by incorporating the required additional pixels step by step. Moreover, the results of color-edge extraction and SRG are integrated to provide homogeneous image regions with accurate and closed boundaries. We also discuss the application of our image segmentation method to automatic face detection. Furthermore, semantic human objects are generated by a seeded region aggregation procedure which takes the detected faces as object seeds.

619 citations


Proceedings ArticleDOI
25 Aug 2001
TL;DR: This work aims to present a technique for shadow detection and suppression used in a system for moving visual object detection and tracking, carried out in the HSV color space to improve the accuracy in detecting shadows.
Abstract: Video-surveillance and traffic analysis systems can be heavily improved using vision-based techniques able to extract, manage and track objects in the scene. However, problems arise due to shadows. In particular, moving shadows can affect the correct localization, measurements and detection of moving objects. This work aims to present a technique for shadow detection and suppression used in a system for moving visual object detection and tracking. The major novelty of the shadow detection technique is the analysis carried out in the HSV color space to improve the accuracy in detecting shadows. Signal processing and optic motivations of the approach proposed are described. The integration and exploitation of the shadow detection module into the system are outlined and experimental results are shown and evaluated.

497 citations


Journal ArticleDOI
01 Mar 2001
TL;DR: The GuideCane is a device designed to help blind or visually impaired users navigate safely and quickly among obstacles and other hazards with a very noticeable force felt in the handle.
Abstract: The GuideCane is a device designed to help blind or visually impaired users navigate safely and quickly among obstacles and other hazards. During operation, the user pushes the lightweight GuideCane forward. When the GuideCane's ultrasonic sensors detect an obstacle, the embedded computer determines a suitable direction of motion that steers the GuideCane and the user around it. The steering action results in a very noticeable force felt in the handle, which easily guides the user without any conscious effort on his/her part.

397 citations


Journal ArticleDOI
TL;DR: A complete and self-contained theoretical derivation of a subpixel target detector using the generalized likelihood ratio test (GLRT) approach and the linear mixing model (LMM) to characterize the targets and the interfering background is provided.
Abstract: Relative to multispectral sensing, hyperspectral sensing can increase the detectability of pixel and subpixel size targets by exploiting finer detail in the spectral signatures of targets and natural backgrounds. Over the past several years, different algorithms for the detection of full-pixel or subpixel targets with known spectral signature have been developed. The authors take a closer and more in-depth look at the class of subpixel target detection algorithms that explore the linear mixing model (LMM) to characterize the targets and the interfering background. Sensor noise is modeled as a Gaussian random vector with uncorrelated components of equal variance. The paper makes three key contributions. First, it provides a complete and self-contained theoretical derivation of a subpixel target detector using the generalized likelihood ratio test (GLRT) approach and the LMM. Some other widely used algorithms are obtained as byproducts. The performance of the resulting detector, under the postulated model, is discussed in great detail to illustrate the effects of the various operational factors. Second, it introduces a systematic approach to investigate how well the adopted model characterizes the data, and how robust the detection algorithm is to model-data mismatches. Finally, it compares the derived algorithms with regard to two desirable properties: capacity to operate in constant false alarm rate mode and ability to increase the separation between target and background.

387 citations


Journal ArticleDOI
TL;DR: An unconditionally stable numerical scheme is used to implement a fast version of the geodesic active contour model, based on the Weickert-Romeney-Viergever (additive operator splitting) AOS scheme, useful for object segmentation in images.
Abstract: We use an unconditionally stable numerical scheme to implement a fast version of the geodesic active contour model. The proposed scheme is useful for object segmentation in images, like tracking moving objects in a sequence of images. The method is based on the Weickert-Romeney-Viergever (additive operator splitting) AOS scheme. It is applied at small regions, motivated by the Adalsteinsson-Sethian level set narrow band approach, and uses Sethian's (1996) fast marching method for re-initialization. Experimental results demonstrate the power of the new method for tracking in color movies.

379 citations


Journal ArticleDOI
TL;DR: The level of performance reached, in terms of detection accuracy and processing time, allows us to apply this detector to a real world application: the indexing of images and videos.
Abstract: Detecting faces in images with complex backgrounds is a difficult task. Our approach, which obtains state of the art results, is based on a neural network model: the constrained generative model (CGM). Generative, since the goal of the learning process is to evaluate the probability that the model has generated the input data, and constrained since some counter-examples are used to increase the quality of the estimation performed by the model. To detect side view faces and to decrease the number of false alarms, a conditional mixture of networks is used. To decrease the computational time cost, a fast search algorithm is proposed. The level of performance reached, in terms of detection accuracy and processing time, allows us to apply this detector to a real world application: the indexing of images and videos.

369 citations


Patent
12 Nov 2001
TL;DR: An object detection system for detecting instances of an object in a digital image includes an image integrator and an object detector, which includes a classifier (classification function) and image scanner.
Abstract: An object detection system for detecting instances of an object in a digital image includes an image integrator and an object detector, which includes a classifier (classification function) and image scanner. The image integrator receives an input image and calculates an integral image representation of the input image. The image scanner scans the image in same sized subwindows. The object detector uses a cascade of homogenous classification functions or classifiers to classify the subwindows as to whether each subwindow is likely to contain an instance of the object. Each classifier evaluates one or more features of the object to determine the presence of such features in a subwindow that would indicate the likelihood of an instance of the object in the subwindow.

Patent
16 Aug 2001
TL;DR: In this paper, a near object detection (NOD) system includes a plurality of sensors, each of the sensors for providing detection coverage in a predetermined coverage zone and each sensor including a transmit antenna for transmitting a first RF signal, a receive antenna for receiving a second RF signal.
Abstract: A near object detection (NOD) system includes a plurality of sensors, each of the sensors for providing detection coverage in a predetermined coverage zone and each of the sensors including a transmit antenna for transmitting a first RF signal, a receive antenna for receiving a second RF signal and means for sharing information between each of the plurality of sensors in the NOD system.

Journal ArticleDOI
TL;DR: A real-time system for pedestrian tracking in sequences of grayscale images acquired by a stationary camera to integrate this system with a traffic control application such as a pedestrian control scheme at intersections and can be used to detect and track humans in front of vehicles.
Abstract: This paper presents a real-time system for pedestrian tracking in sequences of grayscale images acquired by a stationary camera. The objective is to integrate this system with a traffic control application such as a pedestrian control scheme at intersections. The proposed approach can also be used to detect and track humans in front of vehicles. Furthermore, the proposed schemes can be employed for the detection of several diverse traffic objects of interest (vehicles, bicycles, etc.) The system outputs the spatio-temporal coordinates of each pedestrian during the period the pedestrian is in the scene. Processing is done at three levels: raw images, blobs, and pedestrians. Blob tracking is modeled as a graph optimization problem. Pedestrians are modeled as rectangular patches with a certain dynamic behavior. Kalman filtering is used to estimate pedestrian parameters. The system was implemented on a Datacube MaxVideo 20 equipped with a Datacube Max860 and was able to achieve a peak performance of over 30 frames per second. Experimental results based on indoor and outdoor scenes demonstrated the system s robustness under many difficult situations such as partial or full occlusions of pedestrians.

Journal ArticleDOI
TL;DR: This work describes a system that detects and constructs 3D models for rectilinear buildings with either flat or symmetric gable roofs from multiple aerial images; the multiple images need not be stereo pairs (i.e., they may be acquired at different times).
Abstract: Automatic detection and description of cultural features, such as buildings, from aerial images is becoming increasingly important for a number of applications. This task also offers an excellent domain for studying the general problems of scene segmentation, 3D inference, and shape description under highly challenging conditions. We describe a system that detects and constructs 3D models for rectilinear buildings with either flat or symmetric gable roofs from multiple aerial images; the multiple images, however, need not be stereo pairs (i.e., they may be acquired at different times). Hypotheses for rectangular roof components are generated by grouping lines in the images hierarchically; the hypotheses are verified by searching for presence of predicted walls and shadows. The hypothesis generation process combines the tasks of hierarchical grouping with matching at successive stages. Overlap and containment relations between 3D structures are analyzed to resolve conflicts. This system has been tested on a large number of real examples with good results, some of which are included in the paper along with their evaluations.

Book ChapterDOI
01 Oct 2001
TL;DR: This paper presents a framework and details of the key components for real-time, automatic exploitation of aerial video for surveillance applications, and developed real time, image-processing techniques for 2-D/3-D frame-to-frame alignment, change detection, camera control, and tracking of independently moving objects in cluttered scenes.
Abstract: There is growing interest in performing aerial surveillance using video cameras. Compared to traditional framing cameras, video cameras provide the capability to observe ongoing activity within a scene and to automatically control the camera to track the activity. However, the high data rates and relatively small field of view of video cameras present new technical challenges that must be overcome before such cameras can be widely used. In this paper, we present a framework and details of the key components for real-time, automatic exploitation of aerial video for surveillance applications. The framework involves separating an aerial video into the natural components corresponding to the scene. Three major components of the scene are the static background geometry, moving objects, and appearance of the static and dynamic components of the scene. In order to delineate videos into these scene components, we have developed real time, image-processing techniques for 2-D/3-D frame-to-frame alignment, change detection, camera control, and tracking of independently moving objects in cluttered scenes. The geo-location of video and tracked objects is estimated by registration of the video to controlled reference imagery, elevation maps, and site models. Finally static, dynamic and reprojected mosaics may be constructed for compression, enhanced visualization, and mapping applications.

Proceedings ArticleDOI
01 Dec 2001
TL;DR: A method for automatically learning components by using 3-D head models, which has the advantage that no manual interaction is required for choosing and extracting components.
Abstract: We present a component-based, trainable system for detecting frontal and near-frontal views of faces in still gray images. The system consists of a two-level hierarchy of Support Vector Machine (SVM) classifiers. On the first level, component classifiers independently detect components Of a face. On the second level, a single classifier checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face. We propose a method for automatically learning components by using 3-D head models, This approach has the advantage that no manual interaction is required for choosing and extracting components. Experiments show that the component-based system is significantly more robust against rotations in depth than a comparable system trained on whole face patterns.

Patent
08 Aug 2001
TL;DR: In this paper, the authors present a system and a method for detecting and identifying an object using the Internet and handheld terminals such as mobile phones in combination with Bluetooth or DECT technology for communicating information in relation to the object.
Abstract: The present invention relates to a system and a method for detecting and identifying an object. More specifically the invention relates to a tag for attachment e.g. to luggage, the tag being adapted for transmission of an identifiable signal and a receiver for detecting and identifying the signal. The invention is concerned with use of the Internet and handheld terminals such as mobile phones in combination with Bluetooth™ or DECT technology for communicating information in relation to the object.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: A system to detect passenger cars in aerial images where cars appear as small objects is presented as a 3D object recognition problem to account for the variation in viewpoint and the shadow.
Abstract: We present a system to detect passenger cars in aerial images where cars appear as small objects. We pose this as a 3D object recognition problem to account for the variation in viewpoint and the shadow. We started from psychological tests to find important features for human detection of cars. Based on these observations, we selected the boundary of the car body, the boundary of the front windshield and the shadow as the features. Some of these features are affected by the intensity of the car and whether or not there is a shadow along it. This information is represented in the structure of the Bayesian network that we use to integrate all features. Experiments show very promising results even on some very challenging images.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: A simple probabilistic framework for modeling the relationship between context and object properties is introduced, representing global context information in terms of the spatial layout of spectral components and serving as an effective procedure for context driven focus of attention and scale-selection on real-world scenes.
Abstract: There is general consensus that context can be a rich source of information about an object's identity, location and scale. However the issue of how to formalize centextual influences is still largely open. Here we introduce a simple probabilistic framework for modeling the relationship between context and object properties. We represent global context information in terms of the spatial layout of spectral components. The resulting scheme serves as an effective procedure for context driven focus of attention and scale-selection on real-world scenes. Based on a simple holistic analysis of an image, the scheme is able to accurately predict object locations and sizes.

Proceedings ArticleDOI
26 Sep 2001
TL;DR: This work presents a general-purpose method for segmentation of moving visual objects (MVO) based on an object-level classification in MVO, ghosts and shadows, which uses motion and shadow information to selectively exclude from the background model MVO and their shadows, while retaining ghosts.
Abstract: Many approaches to moving object detection for traffic monitoring and video surveillance proposed in the literature are based on background suppression methods. How to correctly and efficiently update the background model and how to deal with shadows are two of the more distinguishing and challenging features of such approaches. This work presents a general-purpose method for segmentation of moving visual objects (MVO) based on an object-level classification in MVO, ghosts and shadows. Background suppression needs the background model to be estimated and updated: we use motion and shadow information to selectively exclude from the background model MVO and their shadows, while retaining ghosts. The color information (in the HSV color space) is exploited to shadow suppression and, consequently, to enhance both MVO segmentation and background update.

Proceedings ArticleDOI
19 Jun 2001
TL;DR: A new adaptive reconstruction scheme for calculating range images as well as sharp images is presented and a new so called focus measures are introduced and are compared to the classic approaches.
Abstract: Light microscopy enlarges the viewing angle while decreasing the depth of focus. This leads to mainly blurred images if the specimen being observed consists of significant height changes. In computer vision, solving this problem is known as 'shape from focus'. Algorithms exist that perform both the calculation of a sharp image and the recovery of the three dimensional structure of the specimen. In this paper, three classic approaches for detecting sharp image regions are evaluated. Three new so called focus measures are introduced and are compared to the classic approaches. A new adaptive reconstruction scheme for calculating range images as well as sharp images is presented. Experimental results on synthetic and real data demonstrate the performance of the proposed algorithm.

Journal ArticleDOI
TL;DR: The proposed PP method is to project a high dimensional data set into a low dimensional data space while retaining desired information of interest while utilizing a projection index to explore projections of interestingness.
Abstract: The authors present a projection pursuit (PP) approach to target detection. Unlike most of developed target detection algorithms that require statistical models such as linear mixture, the proposed PP is to project a high dimensional data set into a low dimensional data space while retaining desired information of interest. It utilizes a projection index to explore projections of interestingness. For target detection applications in hyperspectral imagery, an interesting structure of an image scene is the one caused by man-made targets in a large unknown background. Such targets can be viewed as anomalies in an image scene due to the fact that their size is relatively small compared to their background surroundings. As a result, detecting small targets in an unknown image scene is reduced to finding the outliers of background distributions. It is known that "skewness," is defined by normalized third moment of the sample distribution, measures the asymmetry of the distribution and "kurtosis" is defined by normalized fourth moment of the sample distribution measures the flatness of the distribution. They both are susceptible to outliers. So, using skewness and kurtosis as a base to design a projection index may be effective for target detection. In order to find an optimal projection index, an evolutionary algorithm is also developed to avoid trapping local optima. The hyperspectral image experiments show that the proposed PP method provides an effective means for target detection.

Proceedings ArticleDOI
29 Oct 2001
TL;DR: An algorithm that identifies range readings in areas that was detected earlier as free is described, and is able to track a moving person walking around, while consuming only about 2% of the available processing power.
Abstract: In the field of computer vision, the detection and tracking of moving objects from a moving observer is a complex and computationally demanding task. Using a laser range scanner instead of a camera, the problem can be simplified dramatically. An algorithm that identifies range readings in areas that was detected earlier as free is described. This is done without incorporating any gridmaps that are inherently memory and computationally consuming. The algorithm is robust from the real-time test in a furnished living room. It is able to track a moving person walking around, while consuming only about 2% of the available processing power.

Journal ArticleDOI
TL;DR: An object segmentation and tracking algorithm for visual surveillance applications that generates motion trajectories and sets a motion model using polynomial curve fitting and an efficient way of indexing and searching based on object-specific features at different semantic levels is proposed.
Abstract: This paper proposes an object segmentation and tracking algorithm for visual surveillance applications. In order to detect moving objects from a dynamic background scene which may have temporal clutters such as swaying plants, we devised an adaptive background update method and a motion classification rule. A two-dimensional token-based tracking system using a Kalman filter is designed to track individual objects under occlusion conditions. We propose a new occlusion reasoning approach where we consider two different types of occlusion: explicit occlusion and implicit occlusion. By tracking individual objects with segmented data, we can generate motion trajectories and set a motion model using polynomial curve fitting. The trajectory model is used as an indexing key for accessing the individual object in the semantic level. We also propose an efficient way of indexing and searching based on object-specific features at different semantic levels. The proposed searching scheme supports various queries including query by example, query by sketch, and query on weighting parameters for event-based retrieval. When retrieving an interested video clip, the system returns the best matching event in the similarity order. In addition, we implement a temporal event graph for direct accessing and browsing of a specific event in the video sequence.

Journal ArticleDOI
TL;DR: Certain thermoplastic blends have substantially improved hot strength (i.e., cohesive strength and resistance to sagging or tearing during milling or other hot processing).
Abstract: A method for deformable shape detection and recognition is described. Deformable shape templates are used to partition the image into a globally consistent interpretation, determined in part by the minimum description length principle. Statistical shape models enforce the prior probabilities on global, parametric deformations for each object class. Once trained, the system autonomously segments deformed shapes from the background, while not merging them with adjacent objects or shadows. The formulation can be used to group image regions obtained via any region segmentation algorithm, e.g., texture, color, or motion. The recovered shape models can be used directly in object recognition. Experiments with color imagery are reported.

Proceedings ArticleDOI
01 Feb 2001
TL;DR: In this paper, the authors derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions.
Abstract: In a known environment, objects may be tracked in multiple views using a set of background models. Stereo-based models can be illumination-invariant, but often have undefined values which inevitably lead to foreground classification errors. We derive dense stereo models for object tracking using long-term, extended dynamic-range imagery, and by detecting and interpolating uniform but unoccluded planar regions. Foreground points are detected quickly in new images using pruned disparity search. We adopt a "late-segmentation" strategy, using an integrated plan-view density representation. Foreground points are segmented into object regions only when a trajectory is finally estimated, using a dynamic programming-based method. Object entry and exit are optimally determined and are not restricted to special spatial zones.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: A new method for tracking rigid objects in image sequences using template matching using a Kalman filter to make the template adapt to changes in object orientation or illumination is proposed.
Abstract: We propose a new method for tracking rigid objects in image sequences using template matching. A Kalman filter is used to make the template adapt to changes in object orientation or illumination. This approach is novel since the Kalman filter has been used in tracking mainly for smoothing the object trajectory. The performance of the Kalman filter is further improved by employing a robust and adaptive filtering algorithm. Special attention is paid to occlusion handling.

Book
01 Nov 2001
TL;DR: A comparison between Continuous and Burst Recognition-driven Transmission Policies in Distributed 3G Surveillance Systems F. Oberti, et al.
Abstract: Part I: Industrial Applications. 1. Real-time Video Analysis at Siemens Corporate Research N. Paragios, et al. 2. Aerial Video Surveillance and Exploitation R. Kumar. 3. Two Examples of Indoor and Outdoor Surveillance Systems I. Pavlidis, V. Morellas. 4. Visual Surveillance in Retail Stores and in the Home T. Brodsky, et al. Part II: Detection and Tracking. 5. Detecting and Tracking People in Complex Scenes Y. Kuno. 6. Bayesian Modality Fusion for Tracking Multiple People with a Multi-Camera System T.-H. Chang, S. Gong. 7. Tracking Groups of People for Video Surveillance F. Cupillard, et al. 8. Colour-Invariant Motion Detection under Fast Illumination Changes M. Xu, T. Ellis. 9. Face and Facial Feature Tracking: Using the Active Appearance Algorithm J. Ahlberg. 10. Object Tracking and Shoslif Tree Based Classification using Shape and Colour Features L. Marcenaro, et al. 11. An Improved Adaptive Background Mixture Model for Real-Time Tracking with Shadow Detection P. KaewTraKulPong, R. Bowden. 12. The Sakbot System for Moving Object Detection and Tracking R. Cucchiara, et al. 13. Assessment of Image Processing Techniques as a means of Improving Personal Security in Public Transport L.M. Fuentes, S.A. Velastin. 14. On the use of Colour Filtering in an Integrated Real-Time People Tracking System N.T. Siebel, S.J. Maybank. Part III: Event Detection and Analysis. 15. Modelling and Recognition of Human Actions using a Stochastic Approach E.B. Koller-Meier, L. Van Gool. 16. VIGILANT: Content-Querying of Video Surveillance Streams D. Greenhill, et al. 17. Evaluation of a Self-learning Event Detector C. Kaas, et al. 18. Automated Detection of Localized Visual Events over varying Temporal Scales J. Sherrah, S. Gong. 19. Real-Time Visual Recognition of Dynamic Arm Gestures H.H. Aviles-Arriaga, L.E. Sucar-Succar. Part IV: Distributed Architectures. 20. Distributed Multi-Sensor Surveillance: Issues and recent advances P.K. Varshney, I.L. Coman. 21. Intelligence Distribution of a Third Generation People Counting System Transmitting Information over an Urban Digital Radio Link C.S. Regazzoni, et al. 22. A Comparison between Continuous and Burst Recognition-driven Transmission Policies in Distributed 3G Surveillance Systems F. Oberti, et al. Index.

Patent
28 Feb 2001
TL;DR: In this paper, a coarse-to-fine object detection strategy coupled with exhaustive object search across different positions and scales results in an efficient and accurate object detection scheme, and the object detection then proceeds with sampling of the quantized wavelet coefficients at different image window locations on the input image and efficient lookup of pre-computed log-likelihood tables to determine object presence.
Abstract: An object finder program for detecting presence of a 3D object in a 2D image containing a 2D representation of the 3D object. The object finder uses the wavelet transform of the input 2D image for object detection. A pre-selected number of view-based detectors are trained on sample images prior to performing the detection on an unknown image. These detectors then operate on the given input image and compute a quantized wavelet transform for the entire input image. The object detection then proceeds with sampling of the quantized wavelet coefficients at different image window locations on the input image and efficient look-up of pre-computed log-likelihood tables to determine object presence. The object finder's coarse-to-fine object detection strategy coupled with exhaustive object search across different positions and scales results in an efficient and accurate object detection scheme. The object finder detects a 3D object over a wide range in angular variation (e.g., 180 degrees) through the combination of a small number of detectors each specialized to a small range within this range of angular variation.

Proceedings ArticleDOI
10 Oct 2001
TL;DR: Two methods of detecting and tracking objects in color video are presented and color and edge histograms are explored as ways to model the background and foreground of a scene.
Abstract: Two methods of detecting and tracking objects in color video are presented. Color and edge histograms are explored as ways to model the background and foreground of a scene. The two types of methods are evaluated to determine their speed, accuracy and robustness. Histogram comparison techniques are used to compute similarity values that aid in identifying regions of interest. Foreground objects are detected and tracked by dividing each video frame into smaller regions (cells) and comparing the histogram of each cell to the background model. Results are presented for video sequences of human activity.