scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 2001"


Proceedings ArticleDOI
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

18,620 citations


Journal ArticleDOI
TL;DR: A comprehensive and critical survey of face detection algorithms, ranging from simple edge-based algorithms to composite high-level approaches utilizing advanced pattern recognition methods, is presented.

1,565 citations


Book ChapterDOI
TL;DR: A two-step process that allows both coarse detection and exact localization of faces is presented and an efficient implementation is described, making this approach suitable for real-time applications.
Abstract: The localization of human faces in digital images is a fundamental step in the process of face recognition. This paper presents a shape comparison approach to achieve fast, accurate face detection that is robust to changes in illumination and background. The proposed method is edge-based and works on grayscale still images. The Hausdorff distance is used as a similarity measure between a general face model and possible instances of the object within the image. The paper describes an efficient implementation, making this approach suitable for real-time applications. A two-step process that allows both coarse detection and exact localization of faces is presented. Experiments were performed on a large test set base and rated with a new validation measurement.

984 citations


Journal ArticleDOI
Rainer Lienhart1
TL;DR: This survey emphasizes those different core concepts underlying the different detection schemes for the three most widely used video transition effects: hard cuts, fades and dissolves.
Abstract: A large number of shot boundary detection, or equivalently, transition detection techniques have been developed in recent years. They all can be classified based on a few core concepts underlying the different detection schemes. This survey emphasizes those different core concepts underlying the different detection schemes for the three most widely used video transition effects: hard cuts, fades and dissolves. Representative of each concept one or a few very sound and thoroughly tested approaches are present in detail, while others are just listed. Whenever reliable performance numbers could be found in the literature, they are mentioned. Guidelines for practitioners in video processing are also given.

311 citations


Proceedings ArticleDOI
01 Dec 2001
TL;DR: In this paper, the problem of finding point correspondences in images by way of an approach to template matching that is robust under affine distortions is addressed by applying "geometric blur" to both the template and the image, resulting in a falloff in similarity that is close to linear in the norm of the distortion.
Abstract: We address the problem of finding point correspondences in images by way of an approach to template matching that is robust under affine distortions. This is achieved by applying "geometric blur" to both the template and the image, resulting in a fall-off in similarity that is close to linear in the norm of the distortion between the template and the image. Results in wide baseline stereo correspondence, face detection, and feature correspondence are included.

310 citations


Journal ArticleDOI
TL;DR: A number of evolutionary agents are uniformly distributed in the 2-D image environment to detect the skin-like pixels and segment each face-like region by activating their evolutionary behaviors, and wavelet decomposition is applied to each region to detect possible facial features.

192 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: A system to detect passenger cars in aerial images where cars appear as small objects is presented as a 3D object recognition problem to account for the variation in viewpoint and the shadow.
Abstract: We present a system to detect passenger cars in aerial images where cars appear as small objects. We pose this as a 3D object recognition problem to account for the variation in viewpoint and the shadow. We started from psychological tests to find important features for human detection of cars. Based on these observations, we selected the boundary of the car body, the boundary of the front windshield and the shadow as the features. Some of these features are affected by the intensity of the car and whether or not there is a shadow along it. This information is represented in the structure of the Bayesian network that we use to integrate all features. Experiments show very promising results even on some very challenging images.

179 citations


Journal ArticleDOI
TL;DR: A novel eye detection method for gray intensity image that uses multi-cues for detecting eye windows from a face image using variance projection function for eye detection and verification.

176 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: Experimental results show that fusion of evidences from multi-views can produce better results than using the result from a single view, and that this kernel machine based approach for learning nonlinear mappings for multi-view face detection and pose estimation yields high detection and low false alarm rates.
Abstract: Face images are subject to changes in view and illumination. Such changes cause data distribution to be highly nonlinear and complex in the image space. It is desirable to learn a nonlinear mapping from the image space to a low dimensional space such that the distribution becomes simpler tighter and therefore more predictable for better modeling effaces. In this paper we present a kernel machine based approach for learning such nonlinear mappings. The aim is to provide an effective view-based representation for multi-view face detection and pose estimation. Assuming that the view is partitioned into a number of distinct ranges, one nonlinear view-subspace is learned for each (range of) view from a set of example face images of that view (range), by using kernel principal component analysis (KPCA). Projections of the data onto the view-subspaces are then computed as view-based nonlinear features. Multi-view face detection and pose estimation are performed by classifying a face into one of the facial views or into the nonface class, by using a multi-class kernel support vector classifier (KSVC). Experimental results show that fusion of evidences from multi-views can produce better results than using the result from a single view; and that our approach yields high detection and low false alarm rates in face detection and good accuracy in pose estimation, in comparison with the linear counterpart composed of linear principal component analysis (PCA) feature extraction and Fisher linear discriminant based classification (FLDC).

127 citations


Patent
28 Feb 2001
TL;DR: In this paper, a coarse-to-fine object detection strategy coupled with exhaustive object search across different positions and scales results in an efficient and accurate object detection scheme, and the object detection then proceeds with sampling of the quantized wavelet coefficients at different image window locations on the input image and efficient lookup of pre-computed log-likelihood tables to determine object presence.
Abstract: An object finder program for detecting presence of a 3D object in a 2D image containing a 2D representation of the 3D object. The object finder uses the wavelet transform of the input 2D image for object detection. A pre-selected number of view-based detectors are trained on sample images prior to performing the detection on an unknown image. These detectors then operate on the given input image and compute a quantized wavelet transform for the entire input image. The object detection then proceeds with sampling of the quantized wavelet coefficients at different image window locations on the input image and efficient look-up of pre-computed log-likelihood tables to determine object presence. The object finder's coarse-to-fine object detection strategy coupled with exhaustive object search across different positions and scales results in an efficient and accurate object detection scheme. The object finder detects a 3D object over a wide range in angular variation (e.g., 180 degrees) through the combination of a small number of detectors each specialized to a small range within this range of angular variation.

120 citations


Proceedings ArticleDOI
15 Jul 2001
TL;DR: A universal and robust model of the human skin color that caters for all human races is developed and the ability to detecting solid skin regions in color images by the model is extremely useful in applications such as face detection and recognition, and human gesture analysis.
Abstract: We propose a new image classification technique that utilizes neural networks to classify skin and non-skin pixels in color images. The aim is to develop a universal and robust model of the human skin color that caters for all human races. The ability to detecting solid skin regions in color images by the model is extremely useful in applications such as face detection and recognition, and human gesture analysis. Experimental results show that the neural network classifiers can consistently achieve up to 90% accuracy in skin color detection.

Proceedings ArticleDOI
01 Jan 2001
TL;DR: Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 170 with similar classification performance.
Abstract: We present a two-step method to speed-up object detection systems in computer vision that use Support Vector Machines (SVMs) as classifiers. In a first step we perform feature reduction by choosing relevant image features according to a measure derived from statistical learning theory. In a second step we build a hierarchy of classifiers. On the bottom level, a simple and fast classifier analyzes the whole image and rejects large parts of the background On the top level, a slower but more accurate classifier performs the final detection. Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 170 with similar classification performance.

Patent
19 Nov 2001
TL;DR: In this paper, a robot device includes a face tracing module (M2) for tracing a face moving in an image taken by a CCD camera, a face detecting module(M1) for detecting the face data on the face in the image taken using the imaging means, on the basis of the face tracing information by M2, and a face identifying module (m3) for identifying a specific face on the ground of face data detected by M1.
Abstract: A robot device includes a face tracing module (M2) for tracing a face moving in an image taken by a CCD camera, a face detecting module (M1) for detecting the face data on the face in the image taken by the imaging means, on the basis of the face tracing information by the face tracing module (M2), and a face identifying module (M3) for identifying a specific face on the basis of the face data detected by the face detecting module (M1).

Proceedings ArticleDOI
07 Oct 2001
TL;DR: This paper presents an unsupervised color segmentation technique to divide skin detected pixels into a set of homogeneous regions which can be used in face detection applications or any other application which may requirecolor segmentation.
Abstract: This paper presents an unsupervised color segmentation technique to divide skin detected pixels into a set of homogeneous regions which can be used in face detection applications or any other application which may require color segmentation. The algorithm is carried out in a two stage processing, where the chrominance and luminance information are used consecutively. For each stage a novel algorithm which combines pixel and region based color segmentation techniques is used. The algorithm has proven to be effective under a large number of test images.

Proceedings ArticleDOI
07 Oct 2001
TL;DR: A method for eye tracking built into five stages, coarse and fine face detection, finding the eye region of maximum probability, map of the pupils location and pupil/iris detection, which reached a 99% correct detection rate on databases respectively.
Abstract: A non-invasive interface to track eye position using digital image processing techniques is under development. Information about head and eye position is obtained from digital images. The objective is to develop an interface to detect eye position based only on digital image processing algorithms, free of electrodes or other electronic devices. We propose a method for eye tracking built into five stages. These include: coarse and fine face detection, finding the eye region of maximum probability, map of the pupil/iris location and pupil/iris detection. Using frontal face images obtained from a database, the probability maps for the eye region were built. Only gray levels are considered for this computation (8 bits). The algorithms for face and eye detection were assessed on 102 images from the Purdue database and on 897 images from a video sequence. The face detection algorithm reached a 99% and 100% correct detection rate on the databases respectively. On the same databases the pupil/iris detection algorithm reached 85.3% and 98.4% of correct detection respectively.

Proceedings ArticleDOI
07 May 2001
TL;DR: An image and video indexing approach that combines face detection and face recognition methods that is able to discriminate between three different newscasters and an interviewed person is presented.
Abstract: This paper presents an image and video indexing approach that combines face detection and face recognition methods. Images of a database or frames of a video sequence are scanned for faces by a neural network-based face detector. The extracted faces are then grouped into clusters by a combination of a face recognition method using pseudo two-dimensional hidden Markov models and a k-means clustering algorithm. Each resulting main cluster consists of the face images of one person. In a subsequent step, the detected faces are labeled as one of the different people in the video sequence or the image database and the occurrence of the people can be evaluated. The results of the proposed approach on a TV broadcast news sequence are presented. It is demonstrated that the system is able to discriminate between three different newscasters and an interviewed person.

Proceedings ArticleDOI
01 Jan 2001
TL;DR: A face detection algorithm for color images in the presence of varying lighting conditions as well as complex backgrounds is proposed, and then generates face candidates based on the spatial arrangement of these skin patches.
Abstract: Human face detection is often the first step in applications such as video surveillance, human computer interface, face recognition, and image database management. We propose a face detection algorithm for color images in the presence of varying lighting conditions as well as complex backgrounds. Our method detects skin regions over the entire image, and then generates face candidates based on the spatial arrangement of these skin patches. The algorithm constructs eye, mouth, and boundary maps for verifying each face candidate. Experimental results demonstrate successful detection over a wide variety of facial variations in color, position, scale, rotation, pose, and expression from several photo collections.

Proceedings ArticleDOI
Yu-Fei Ma1, Hong-Jiang Zhang1
01 Aug 2001
TL;DR: The proposed method for detecting moving objects based on spatio-temporal entropy is more robust to noises than traditional difference based methods and can be used in real-time surveillance.
Abstract: Moving object detection is an important task in video analysis. In this paper, we propose a new method for detecting moving objects based on spatio-temporal entropy. By measuring color variation in multiple successive frames, a spatio-temporal entropy image (STPI) is formed. Then the morphological methodology is employed to extract salient motions or moving objects from STEI .S inceSTEI is a statistical measurement of variation, the proposed method is more robust to noises than traditional difference based methods. Experimental results show that the proposed approach is effective for motion objects detection, and it can be used in real-time surveillance.

Proceedings ArticleDOI
01 Dec 2001
TL;DR: This paper investigates applications of a new representation for images, the similarity template, a probabilistic representation of the similarity of pixels in an image patch that enables the decomposition of a class of objects into component parts over which robust statistics of color can be approximated.
Abstract: This paper investigates applications of a new representation for images, the similarity template. A similarity template is a probabilistic representation of the similarity of pixels in an image patch. It has application to detection of a class of objects, because it is reasonably invariant to the color of a particular object. Further, it enables the decomposition of a class of objects into component parts over which robust statistics of color can be approximated. These regions can be used to create a factored color model that is useful for recognition. Detection results are shown on a system that learns to detect a class of objects (pedestrians) in static scenes based on examples of the object provided automatically by a tracking system. Applications of the factored color model to image indexing and anomaly detection are pursued on a database of images of pedestrians.

Proceedings ArticleDOI
08 Dec 2001
TL;DR: A new method for detecting faces in a video sequence where detection is not limited to frontal views and based on the method developed by Schneiderman, which guarantees the accuracy of accumulation as well as a continuous detection.
Abstract: This paper presents a new method for detecting faces in a video sequence where detection is not limited to frontal views. The three novel contributions of the paper are : (1) Accumulation of probabilities of detection over a sequence. This allows to obtain a coherent detection over time as well as independence from thresholds. (2) Prediction of the detection parameters which are position, scale and pose. This guarantees the accuracy of accumulation as well as a continuous detection. (3) The way pose is represented. The representation is based on the combination of two detectors, one for frontal views and one for profiles. Face detection is fully automatic and is based on the method developed by Schneiderman [13]. It uses local histograms of wavelet coefficients represented with respect to a coordinate frame fixed to the object. A probability of detection is obtained for each image position, several scales and the two detectors. The probabilities of detection are propagated over time using a Condensation filter and factored sampling. Prediction is based on a zero order model for position, scale and "pose"; update uses the probability maps produced by the detection routine. Experiments show a clear improvement over frame-based detection results.

Journal ArticleDOI
TL;DR: This work addresses the question of how the visual system classifies images into face and non-face patterns and focuses on face detection in impoverished images, which allow for an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance.
Abstract: The ability to detect faces in images is of critical ecological significance. It is a pre-requisite for other important face perception tasks such as person identification, gender classification and affect analysis. Here we address the question of how the visual system classifies images into face and non-face patterns. We focus on face detection in impoverished images, which allow us to explore information thresholds required for different levels of performance. Our experimental results provide lower bounds on image resolution needed for reliable discrimination between face and non-face patterns and help characterize the nature of facial representations used by the visual system under degraded viewing conditions. Specifically, they enable an evaluation of the contribution of luminance contrast, image orientation and local context on face-detection performance. Research reported in this paper was supported in part by funds from the Defense Advanced Research Projects Agency and a Sloan fellowship for neuroscience to PS.

01 Dec 2001
TL;DR: A motion detection and tracking algorithm is presented for monitoring the pedestrians in an outdoor scene from a fixed camera and a Bayesian network is constructed to reason about the uncertainty in the tracking.
Abstract: A motion detection and tracking algorithm is presented for monitoring the pedestrians in an outdoor scene from a fixed camera. A mixture of Gaussians is used to model each pixel of the background image and thus adaptive to the dynamic scene. Colour chromaticity is used as the image representation, which results in the ill umination-invariant change detection in a daylit environment. To correctly interpret those objects that are occluded, merged, split or exit from the scene, a scene model is created and the motion of each object is predicted. A Bayesian network is constructed to reason about the uncertainty in the tracking. The results for detecting and tracking the moving objects in the PETS sequences are demonstrated.

Book ChapterDOI
TL;DR: It is shown that edge orientation is a powerful local image feature to model objects like faces for detection purposes and a simple and efficient method for template matching and object modeling based solely on edge orientation information is presented.
Abstract: In this paper we describe our ongoing work on real-time face detection in grey level images using edge orientation information. We will show that edge orientation is a powerful local image feature to model objects like faces for detection purposes. We will present a simple and efficient method for template matching and object modeling based solely on edge orientation information. We also show how to obtain an optimal face model in the edge orientation domain from a set of training images. Unlike many approaches that model the grey level appearance of the face our approach is computationally very fast. It takes less than 0.08 seconds on a Pentium II 500MHz for a 320×240 image to be processed using a multi-resolution search with six resolution levels. We demonstrate the capability of our detection method on an image database of 17000 images taken from more than 2900 different people. The variations in head size, lighting and background are considerable. The obtained detection rate is more than 93% on that database.

Proceedings ArticleDOI
15 Jul 2001
TL;DR: A technique for detecting goals during a football match by using images acquired by a single camera placed externally to the field, which can be thought of as an electronic linesman which helps the referee in establishing the occurrence of a goal during afootball match.
Abstract: We present a technique for detecting goals during a football match by using images acquired by a single camera placed externally to the field. The method does not require the modification neither of the ball nor of the goalmouth. Due to the attitude of the camera with respect to the football ground, the system can be thought of as an electronic linesman which helps the referee in establishing the occurrence of a goal during a football match. The occurrence of the event is established by detecting the ball and comparing its position with respect to the location of the goalpost in the image. The ball detection technique relies on a supervised learning scheme called support vector machines for classification. The examples used for training are appropriately filtered version of views of the object to be detected, previously stored in the form of image patterns. We have extensively tested the technique on real images in which the ball is both fully visible and partially occluded. The performance of the proposed detection scheme are measured in terms of detection rate, false positive rate and precision in the ball localisation in image.

Proceedings ArticleDOI
01 Dec 2001
TL;DR: It is shown that a natural tree structure is not required, and a mixture of trees is used for both frontal and view-invariant face detection, and by modeling faces as collections of features the authors can establish an intrinsic coordinate frame for a face, and estimate the out-of-plane rotation of a face.
Abstract: Efficient detection of objects in images is complicated by variations of object appearance due to intra-class object differences, articulation, lighting, occlusions, and aspect variations. To reduce the search required for detection, we employ the bottom-up approach where we find candidate image features and associate some of them with parts of the object model. We represent objects as collections of local features, and would like to allow any of them to be absent, with only a small subset sufficient for detection;furthermore, our model should allow efficient correspondence search. We propose a model, Mixture of Trees, that achieves these goals. With a mixture of trees, we can model the individual appearances of the features, relationships among them, and the aspect, and handle occlusions. Independences captured in the model make efficient inference possible. In our earlier work, we have shown that mixtures of trees can be used to model objects with a natural tree structure, in the context of human tracking. Now we show that a natural tree structure is not required, and use a mixture of trees for both frontal and view-invariant face detection. We also show that by modeling faces as collections of features we can establish an intrinsic coordinate frame for a face, and estimate the out-of-plane rotation of a face.

Proceedings ArticleDOI
08 Dec 2001
TL;DR: Kernel Discriminant Analysis is developed to extract the significant non-linear discriminating features which maximise the between- class variance and minimise the within-class variance in multi-view face images.
Abstract: Recognising face with large pose variation is more challenging than that in a fixed view, e.g. frontal-view, due to the severe non-linearity caused by rotation in depth, self-shading and self-occlusion. To address this problem, a multi-view dynamic face model is designed to extract the shape-and-pose-free facial texture patterns from multi-view face images. Kernel Discriminant Analysis is developed to extract the significant non-linear discriminating features which maximise the between-class variance and minimise the within-class variance. By using the kernel technique, this process is equivalent to a Linear Discriminant Analysis in a high-dimensional feature space which can be solved conveniently. The identity surfaces are then constructed from these non-linear discriminating features. Face recognition can be performed dynamically from an image sequence by matching an object trajectory and model trajectories on the identity surfaces.

Proceedings ArticleDOI
07 Oct 2001
TL;DR: The paper proposes the application of majority voting on the output of several support vector machines in order to select the most suitable learning machine for frontal face detection and results indicate a significant reduction of the rate of false positive patterns.
Abstract: The paper proposes the application of majority voting on the output of several support vector machines in order to select the most suitable learning machine for frontal face detection. The first experimental results indicate a significant reduction of the rate of false positive patterns.

01 Jan 2001
TL;DR: A new method for detection of complex curvatures such as corners, circles, and star patterns is presented, based on a second degree local polynomial model applied to a localomial model.
Abstract: This thesis presents a new method for detection of complex curvatures such as corners, circles, and star patterns. The method is based on a second degree local polynomial model applied to a local ...

Journal ArticleDOI
01 Aug 2001
TL;DR: Two novel video object extraction schemes are proposed, specifically designed for two different scenarios of content-based video analysis applications, where automatic detection of new appearance of objects are important in envisaging on-line object-oriented applications as well as object-based coding.
Abstract: In this paper, we propose two novel video object (VO) extraction schemes, specifically designed for two different scenarios of content-based video analysis applications. One is a change detection-based VO extraction algorithm appropriate to surveillance type video sequences, where automatic detection of new appearance of objects are important in envisaging on-line object-oriented applications as well as object-based coding. The other is an object tracking-based method, which is especially robust to video sequences with moving background, although human intervention is needed in the process. In both cases, the semantically meaningful video objects are obtained by a final regularization stage realized by means of a cascade of morphological filters. Experimental results obtained on the MPEG-4 test sequences are presented respectively.

Proceedings Article
03 Jan 2001
TL;DR: This paper shows that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search.
Abstract: The most popular algorithms for object detection require the use of exhaustive spatial and scale search procedures. In such approaches, an object is defined by means of local features. In this paper we show that including contextual information in object detection procedures provides an efficient way of cutting down the need for exhaustive search. We present results with real images showing that the proposed scheme is able to accurately predict likely object classes, locations and sizes.