scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2001"


Proceedings ArticleDOI
01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

18,620 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: A new image representation called the “Integral Image” is introduced which allows the features used by the detector to be computed very quickly and a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions.
Abstract: This paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algo- rithm (Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a "cascade" which allows back- ground regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor- mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per second.

10,592 citations


Proceedings Article
01 Jan 2001
TL;DR: Viola et al. as mentioned in this paper proposed a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates using a new image representation called the integral image, which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient classifiers [4]. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performance comparable to the best previous systems [16, 11, 14, 10, 1]. Implemented on a conventional desktop, face detection proceeds at 15 frames per second. Author email: fPaul.Viola,Mike.J.Jonesg@compaq.com c Compaq Computer Corporation, 2001 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of the Cambridge Research Laboratory of Compaq Computer Corporation in Cambridge, Massachusetts; an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the Cambridge Research Laboratory. All rights reserved. CRL Technical reports are available on the CRL’s web page at http://crl.research.compaq.com. Compaq Computer Corporation Cambridge Research Laboratory One Cambridge Center Cambridge, Massachusetts 02142 USA

1,648 citations


Journal ArticleDOI
TL;DR: A comprehensive and critical survey of face detection algorithms, ranging from simple edge-based algorithms to composite high-level approaches utilizing advanced pattern recognition methods, is presented.

1,565 citations


Book ChapterDOI
TL;DR: A two-step process that allows both coarse detection and exact localization of faces is presented and an efficient implementation is described, making this approach suitable for real-time applications.
Abstract: The localization of human faces in digital images is a fundamental step in the process of face recognition. This paper presents a shape comparison approach to achieve fast, accurate face detection that is robust to changes in illumination and background. The proposed method is edge-based and works on grayscale still images. The Hausdorff distance is used as a similarity measure between a general face model and possible instances of the object within the image. The paper describes an efficient implementation, making this approach suitable for real-time applications. A two-step process that allows both coarse detection and exact localization of faces is presented. Experiments were performed on a large test set base and rated with a new validation measurement.

984 citations


Proceedings Article
03 Jan 2001
TL;DR: A new variant of AdaBoost is proposed as a mechanism for training the simple classifiers used in the cascade in domains where the distribution of positive and negative examples is highly skewed (e.g. face detection or database retrieval).
Abstract: This paper develops a new approach for extremely fast detection in domains where the distribution of positive and negative examples is highly skewed (e.g. face detection or database retrieval). In such domains a cascade of simple classifiers each trained to achieve high detection rates and modest false positive rates can yield a final detector with many desirable features: including high detection rates, very low false positive rates, and fast performance. Achieving extremely high detection rates, rather than low error, is not a task typically addressed by machine learning algorithms. We propose a new variant of AdaBoost as a mechanism for training the simple classifiers used in the cascade. Experimental results in the domain of face detection show the training algorithm yields significant improvements in performance over conventional AdaBoost. The final face detection system can process 15 frames per second, achieves over 90% detection, and a false positive rate of 1 in a 1,000,000.

559 citations


Proceedings ArticleDOI
08 Dec 2001
TL;DR: A two-step statistical modeling approach that integrates both a global parametric model and a local nonparametric model is developed, which can generate photorealistic face images.
Abstract: In this paper, we study face hallucination, or synthesizing a high-resolution face image from low-resolution input, with the help of a large collection of high-resolution face images. We develop a two-step statistical modeling approach that integrates both a global parametric model and a local nonparametric model. First, we derive a global linear model to learn the relationship between the high-resolution face images and their smoothed and down-sampled lower resolution ones. Second, the residual between an original high-resolution image and the reconstructed high-resolution image by a learned linear model is modeled by a patch-based nonparametric Markov network, to capture the high-frequency content of faces. By integrating both global and local models, we can generate photorealistic face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated faces.

379 citations


Journal ArticleDOI
TL;DR: The level of performance reached, in terms of detection accuracy and processing time, allows us to apply this detector to a real world application: the indexing of images and videos.
Abstract: Detecting faces in images with complex backgrounds is a difficult task. Our approach, which obtains state of the art results, is based on a neural network model: the constrained generative model (CGM). Generative, since the goal of the learning process is to evaluate the probability that the model has generated the input data, and constrained since some counter-examples are used to increase the quality of the estimation performed by the model. To detect side view faces and to decrease the number of false alarms, a conditional mixture of networks is used. To decrease the computational time cost, a fast search algorithm is proposed. The level of performance reached, in terms of detection accuracy and processing time, allows us to apply this detector to a real world application: the indexing of images and videos.

369 citations


Journal ArticleDOI
TL;DR: The approach is sequential testing which is coarse-to-fine in both in the exploration of poses and the representation of objects, and the spatial distribution of processing is highly skewed and detection is rapid, but at the expense of (isolated) false alarms which could be eliminated with localized, more intensive, processing.
Abstract: We study visual selection: Detect and roughly localize all instances of a generic object class, such as a face, in a greyscale scene, measuring performance in terms of computation and false alarms. Our approach is sequential testing which is coarse-to-fine in both in the exploration of poses and the representation of objects. All the tests are binary and indicate the presence or absence of loose spatial arrangements of oriented edge fragments. Starting from training examples, we recursively find larger and larger arrangements which are “decomposable,” which implies the probability of an arrangement appearing on an object decays slowly with its size. Detection means finding a sufficient number of arrangements of each size along a decreasing sequence of pose cells. At the beginning, the tests are simple and universal, accommodating many poses simultaneously, but the false alarm rate is relatively high. Eventually, the tests are more discriminating, but also more complex and dedicated to specific poses. As a result, the spatial distribution of processing is highly skewed and detection is rapid, but at the expense of (isolated) false alarms which, presumably, could be eliminated with localized, more intensive, processing.

325 citations


Proceedings ArticleDOI
01 Dec 2001
TL;DR: In this paper, the problem of finding point correspondences in images by way of an approach to template matching that is robust under affine distortions is addressed by applying "geometric blur" to both the template and the image, resulting in a falloff in similarity that is close to linear in the norm of the distortion.
Abstract: We address the problem of finding point correspondences in images by way of an approach to template matching that is robust under affine distortions. This is achieved by applying "geometric blur" to both the template and the image, resulting in a fall-off in similarity that is close to linear in the norm of the distortion between the template and the image. Results in wide baseline stereo correspondence, face detection, and feature correspondence are included.

310 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: A method of speeding up the non-linear support vector machine by considering the RV's sequentially, and if at any point a face is deemed too unlikely to cease the sequential evaluation, obviating the need to evaluate the remaining RVs.
Abstract: This paper describes an algorithm for finding faces within an image. The basis of the algorithm is to run an observation window at all possible positions, scales and orientation within the image. A non-linear support vector machine is used to determine whether or not a face is contained within the observation window. The non-linear support vector machine operates by comparing the input patch to a set of support vectors (which can be thought of as face and anti-face templates). Each support vector is scored by some nonlinear function against the observation window and if the resulting sum is over some threshold a face is indicated. Because of the huge search space that is considered, it is imperative to investigate ways to speed up the support vector machine. Within this paper we suggest a method of speeding up the non-linear support vector machine. A set of reduced set vectors (RVs) are calculated from the support vectors. By considering the RV's sequentially, and if at any point a face is deemed too unlikely to cease the sequential evaluation, obviating the need to evaluate the remaining RVs. The idea being that we only need to apply a subset of the RVs to eliminate things that are obviously not a face (thus reducing the computation). The key then is to explore the RVs in the right order and a method for this is proposed.

Proceedings ArticleDOI
01 Dec 2001
TL;DR: A method for automatically learning components by using 3-D head models, which has the advantage that no manual interaction is required for choosing and extracting components.
Abstract: We present a component-based, trainable system for detecting frontal and near-frontal views of faces in still gray images. The system consists of a two-level hierarchy of Support Vector Machine (SVM) classifiers. On the first level, component classifiers independently detect components Of a face. On the second level, a single classifier checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face. We propose a method for automatically learning components by using 3-D head models, This approach has the advantage that no manual interaction is required for choosing and extracting components. Experiments show that the component-based system is significantly more robust against rotations in depth than a comparable system trained on whole face patterns.

Journal ArticleDOI
TL;DR: A number of evolutionary agents are uniformly distributed in the 2-D image environment to detect the skin-like pixels and segment each face-like region by activating their evolutionary behaviors, and wavelet decomposition is applied to each region to detect possible facial features.

Journal ArticleDOI
TL;DR: A novel eye detection method for gray intensity image that uses multi-cues for detecting eye windows from a face image using variance projection function for eye detection and verification.

Journal ArticleDOI
TL;DR: A system that constructs textured 3D face models from videos with minimal user interaction and the goal is to allow an untrained user with a PC and an ordinary camera to create and instantly animate his/her face model in no more than a few minutes.
Abstract: Generating realistic 3D human face models and facial animations has been a persistent challenge in computer graphics. We have developed a system that constructs textured 3Dface models from videos with minimal user interaction. Our system takes images andvideo sequences of a face with an ordinary video camera. After five manual clicks ontwo images to tell the system where the eye corners, nose top and mouth corners are, thesystem automatically generates a realistic looking 3D human head model and the constructed model can be animated immediately. A user with a PC and an ordinary camera can use our system to generate his/her face model in a few minutes. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: An efficient algorithm for human face detection and facial feature extraction is devised using the genetic algorithm and the eigenface technique, and the lighting effect and orientation of the faces are considered and solved.

01 Jan 2001
TL;DR: A statistical shape-fromshading model is developed to recover face shape from a single image, and to synthesize the same face under new illumination to build a simple and fast classifier that was not possible before because of a lack of training data.
Abstract: We propose a model- and exemplar-based approach for face recognition. This problem has been previously tackled using either models or exemplars, with limited success. Our idea uses models to synthesize many more exemplars, which are then used in the learning stage of a face recognition system. To demonstrate this, we develop a statistical shape-fromshading model to recover face shape from a single image, and to synthesize the same face under new illumination. We then use this to build a simple and fast classifier that was not possible before because of a lack of training data.


Proceedings ArticleDOI
28 Dec 2001
TL;DR: This paper depicts an experiment to the face recognition problem by combining eigenfaces and neural network, which can represent face pictures with several coefficients instead of having to use the whole picture.
Abstract: In this paper we depict an experiment to the face recognition problem by combining eigenfaces and neural network. Eigenfaces are applied to extract the relevant information in a face image, which are important for identification. Using this we can represent face pictures with several coefficients (about twenty) instead of having to use the whole picture. Neural networks are used to recognize the face through learning correct classification of the coefficients calculated by the eigenface algorithm. The network is first trained on the pictures from the face database, and then it is used to identify the face pictures given to it. Eight subjects (persons) were used in a database of 80 face images. A recognition accuracy of 95.6% was achieved with vertically oriented frontal views of a human face.

Proceedings Article
03 Jan 2001
TL;DR: An algorithm for automatically learning discriminative components of objects with SVM classifiers based on growing image parts by minimizing theoretical bounds on the error probability of an SVM, which suggests performance at significantly better level than other face detection systems.
Abstract: We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: Experimental results show that fusion of evidences from multi-views can produce better results than using the result from a single view, and that this kernel machine based approach for learning nonlinear mappings for multi-view face detection and pose estimation yields high detection and low false alarm rates.
Abstract: Face images are subject to changes in view and illumination. Such changes cause data distribution to be highly nonlinear and complex in the image space. It is desirable to learn a nonlinear mapping from the image space to a low dimensional space such that the distribution becomes simpler tighter and therefore more predictable for better modeling effaces. In this paper we present a kernel machine based approach for learning such nonlinear mappings. The aim is to provide an effective view-based representation for multi-view face detection and pose estimation. Assuming that the view is partitioned into a number of distinct ranges, one nonlinear view-subspace is learned for each (range of) view from a set of example face images of that view (range), by using kernel principal component analysis (KPCA). Projections of the data onto the view-subspaces are then computed as view-based nonlinear features. Multi-view face detection and pose estimation are performed by classifying a face into one of the facial views or into the nonface class, by using a multi-class kernel support vector classifier (KSVC). Experimental results show that fusion of evidences from multi-views can produce better results than using the result from a single view; and that our approach yields high detection and low false alarm rates in face detection and good accuracy in pose estimation, in comparison with the linear counterpart composed of linear principal component analysis (PCA) feature extraction and Fisher linear discriminant based classification (FLDC).

Journal ArticleDOI
TL;DR: Two methods of eye detection in a face image are described: the face is first detected as a large flesh-colored region, and anthropometric data are then used to estimate the size and separation of the eyes.

Journal ArticleDOI
TL;DR: A mixture-of-Gaussians modeling of the color space, provides a robust representation that can accommodate large color variations, as well as highlights and shadows, in face-color modeling and segmentation.

Book
31 Aug 2001
TL;DR: This book contains a comprehensive survey on existing face detection methods, which will serve as the entry point for new researchers embarking on such topics, and in-depth discussion on motion segmentation algorithms and applications which will benefit more seasoned graduate students or researchers interested in motion pattern recognition.
Abstract: With the ubiquity of new information technology and media, more effective and friendly methods for human computer interaction (HCI) are being developed which do not rely on traditional devices such as keyboards, mice and displays. The first step for any intelligent HCI system is face detection, and one of most friendly HCI systems is hand gesture. Face Detection and Gesture Recognition for Human-Computer Interaction introduces the frontiers of vision-based interfaces for intelligent human computer interaction with focus on two main issues: face detection and gesture recognition. The first part of the book reviews and discusses existing face detection methods, followed by a discussion on future research. Performance evaluation issues on the face detection methods are also addressed. The second part discusses an interesting hand gesture recognition method based on a generic motion segmentation algorithm. The system has been tested with gestures from American Sign Language with promising results. We conclude this book with comments on future work in face detection and hand gesture recognition. Face Detection and Gesture Recognition for Human-Computer Interaction will interest those working in vision-based interfaces for intelligent human computer interaction. It also contains a comprehensive survey on existing face detection methods, which will serve as the entry point for new researchers embarking on such topics. Furthermore, this book also covers in-depth discussion on motion segmentation algorithms and applications, which will benefit more seasoned graduate students or researchers interested in motion pattern recognition.

Patent
15 May 2001
TL;DR: In this article, face detection is used to provide an automatic enhancement of the appearance of an image based on knowledge of human faces in the image, which may have more pleasing lightness, contrast and/or color levels.
Abstract: An image enhancement apparatus and a corresponding method use face detection to provide for automatic enhancement of appearances of an image based on knowledge of human faces in the image. By modifying and transforming the image automatically using facial information, the image, including the human faces in the image, may have more pleasing lightness, contrast, and/or color levels. The image enhancement method may also automatically reduce or remove any red eye artifact without human intervention, leading to images with more pleasing appearances.

Journal ArticleDOI
TL;DR: The results show that orientation-selective Gabor filters enhance differences in pose and that different filter orientations are optimal at different poses, while principal component analysis was found to provide an identity-invariant representation in which similarities can be calculated more robustly.

Journal ArticleDOI
TL;DR: A robust and e$cient human face detection system that can detect multiple faces in complex backgrounds is presented and experimental results reveal that the proposed method is better than traditional methods in terms of e$ciency and accuracy.

Proceedings ArticleDOI
15 Jul 2001
TL;DR: A universal and robust model of the human skin color that caters for all human races is developed and the ability to detecting solid skin regions in color images by the model is extremely useful in applications such as face detection and recognition, and human gesture analysis.
Abstract: We propose a new image classification technique that utilizes neural networks to classify skin and non-skin pixels in color images. The aim is to develop a universal and robust model of the human skin color that caters for all human races. The ability to detecting solid skin regions in color images by the model is extremely useful in applications such as face detection and recognition, and human gesture analysis. Experimental results show that the neural network classifiers can consistently achieve up to 90% accuracy in skin color detection.


Proceedings ArticleDOI
01 Jan 2001
TL;DR: Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 170 with similar classification performance.
Abstract: We present a two-step method to speed-up object detection systems in computer vision that use Support Vector Machines (SVMs) as classifiers. In a first step we perform feature reduction by choosing relevant image features according to a measure derived from statistical learning theory. In a second step we build a hierarchy of classifiers. On the bottom level, a simple and fast classifier analyzes the whole image and rejects large parts of the background On the top level, a slower but more accurate classifier performs the final detection. Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 170 with similar classification performance.