scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2002"


Journal ArticleDOI
TL;DR: In this article, the authors categorize and evaluate face detection algorithms and discuss relevant issues such as data collection, evaluation metrics and benchmarking, and conclude with several promising directions for future research.
Abstract: Images containing faces are essential to intelligent vision-based human-computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face, regardless of its 3D position, orientation and lighting conditions. Such a problem is challenging because faces are non-rigid and have a high degree of variability in size, shape, color and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.

3,894 citations


Proceedings ArticleDOI
Rainer Lienhart1, J. Maydt1
10 Dec 2002
TL;DR: This paper introduces a novel set of rotated Haar-like features that significantly enrich the simple features of Viola et al. scheme based on a boosted cascade of simple feature classifiers.
Abstract: Recently Viola et al. [2001] have introduced a rapid object detection. scheme based on a boosted cascade of simple feature classifiers. In this paper we introduce a novel set of rotated Haar-like features. These novel features significantly enrich the simple features of Viola et al. and can also be calculated efficiently. With these new rotated features our sample face detector shows off on average a 10% lower false alarm rate at a given hit rate. We also present a novel post optimization procedure for a given boosted cascade improving on average the false alarm rate further by 12.5%.

3,133 citations


Journal ArticleDOI
TL;DR: A face detection algorithm for color images in the presence of varying lighting conditions as well as complex backgrounds is proposedBased on a novel lighting compensation technique and a nonlinear color transformation, this method detects skin regions over the entire image and generates face candidates based on the spatial arrangement of these skin patches.
Abstract: Human face detection plays an important role in applications such as video surveillance, human computer interface, face recognition, and face image database management. We propose a face detection algorithm for color images in the presence of varying lighting conditions as well as complex backgrounds. Based on a novel lighting compensation technique and a nonlinear color transformation, our method detects skin regions over the entire image and then generates face candidates based on the spatial arrangement of these skin patches. The algorithm constructs eye, mouth, and boundary maps for verifying each face candidate. Experimental results demonstrate successful face detection over a wide range of facial variations in color, position, scale, orientation, 3D pose, and expression in images from several photo collections (both indoors and outdoors).

2,075 citations


Journal ArticleDOI
TL;DR: This research demonstrates that the LEM, together with the proposed generic line-segment Hausdorff distance measure, provides a new method for face coding and recognition.
Abstract: The automatic recognition of human faces presents a significant challenge to the pattern recognition research community. Typically, human faces are very similar in structure with minor differences from person to person. They are actually within one class of "human face". Furthermore, lighting conditions change, while facial expressions and pose variations further complicate the face recognition task as one of the difficult problems in pattern analysis. This paper proposes a novel concept: namely, that faces can be recognized using a line edge map (LEM). The LEM, a compact face feature, is generated for face coding and recognition. A thorough investigation of the proposed concept is conducted which covers all aspects of human face recognition, i.e. face recognition under (1) controlled/ideal conditions and size variations, (2) varying lighting conditions, (3) varying facial expressions, and (4) varying pose. The system performance is also compared with the eigenface method, one of the best face recognition techniques, and with reported experimental results of other methods. A face pre-filtering technique is proposed to speed up the search process. It is a very encouraging to find that the proposed face recognition technique has performed better than the eigenface method in most of the comparison experiments. This research demonstrates that the LEM, together with the proposed generic line-segment Hausdorff distance measure, provides a new method for face coding and recognition.

505 citations


Book ChapterDOI
Stan Z. Li1, Long Zhu1, ZhenQiu Zhang, Andrew Blake1, Hong-Jiang Zhang1, Heung-Yeung Shum1 
28 May 2002
TL;DR: FloatBoost incorporates the idea of Floating Search into AdaBoost to solve the non-monotonicity problem encountered in the sequential search of AdaBoost and leads to the first real-time multi-view face detection system in the world.
Abstract: A new boosting algorithm, called FloatBoost, is proposed to overcome the monotonicity problem of the sequential AdaBoost learning. AdaBoost [1, 2] is a sequential forward search procedure using the greedy selection strategy. The premise oyered by the sequential procedure can be broken-down when the monotonicity assumption, i.e. that when adding a new feature to the current set, the value of the performance criterion does not decrease, is violated. FloatBoost incorporates the idea of Floating Search [3] into AdaBoost to solve the non-monotonicity problem encountered in the sequential search of AdaBoost.We then present a system which learns to detect multi-view faces using FloatBoost. The system uses a coarse-to-fine, simple-to-complex architecture called detector-pyramid. FloatBoost learns the component detectors in the pyramid and yields similar or higher classification accuracy than AdaBoost with a smaller number of weak classifiers. This work leads to the first real-time multi-view face detection system in the world. It runs at 200 ms per image of size 320x240 pixels on a Pentium-III CPU of 700 MHz. A live demo will be shown at the conference.

489 citations


Journal ArticleDOI
TL;DR: A recently proposed distributed neural system for face perception, with minor modifications, can accommodate the psychological findings with moving faces.

466 citations


Proceedings ArticleDOI
10 Dec 2002
TL;DR: Efficient post-processing techniques namely noise removal, shape criteria, elliptic curve fitting and face/non-face classification are proposed in order to further refine skin segmentation results for the purpose of face detection.
Abstract: This paper presents a new human skin color model in YCbCr color space and its application to human face detection. Skin colors are modeled by a set of three Gaussian clusters, each of which is characterized by a centroid and a covariance matrix. The centroids and covariance matrices are estimated from large set of training samples after a k-means clustering process. Pixels in a color input image can be classified into skin or non-skin based on the Mahalanobis distances to the three clusters. Efficient post-processing techniques namely noise removal, shape criteria, elliptic curve fitting and face/non-face classification are proposed in order to further refine skin segmentation results for the purpose of face detection.

287 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: This paper presents progress toward an integrated, robust, real-time face detection and demographic analysis system and combines estimates from many facial detections in order to reduce the error rate.
Abstract: This paper presents progress toward an integrated, robust, real-time face detection and demographic analysis system. Faces are detected and extracted using the fast algorithm proposed by P. Viola and M.J. Jones (2001). Detected faces are passed to a demographic (gender and ethnicity) classifier which uses the same architecture as the face detector. This demographic classifier is extremely fast, and delivers error rates slightly better than the best-known classifiers. To counter the unconstrained and noisy sensing environment, demographic information is integrated across time for each individual. Therefore, the final demographic classification combines estimates from many facial detections in order to reduce the error rate. The entire system processes 10 frames per second on an 800-MHz Intel Pentium III.

282 citations


Patent
04 Mar 2002
TL;DR: In this paper, the authors proposed a detector-pyramid architecture for real-time multi-view face detection, which uses a sequence of detectors of increasing complexity and face/non-face discriminating thresholds to quickly discard non-faces at the earliest stage possible.
Abstract: A system and method for real-time multi-view (i.e. not just frontal view) face detection. The system and method uses a sequence of detectors of increasing complexity and face/non-face discriminating thresholds to quickly discard non-faces at the earliest stage possible, thus saving much computation compared to prior art systems. The detector-pyramid architecture for multi-view face detection uses a coarse-to-fine and simple-to-complex scheme. This architecture solves the problem of lengthy processing that precludes real-time face detection effectively and efficiently by discarding most of non-face sub-windows using the simplest possible features at the earliest possible stage. This leads to the first real-time multi-view face detection system which has the accuracy almost as good as the state-of-the-art system yet 270 times faster, allowing real-time performance.

210 citations


Patent
Yingli Tian1, Rudolf Maarten Bolle1
17 Jan 2002
TL;DR: In this paper, a face detector is used to detect the pose and position of a face and find the facial components, then a set of geometrical facial features and three histograms in zones of mouth are extracted.
Abstract: A system and method for automatic detecting neutral expressionless faces in digital images and video is described. First a face detector is used to detect the pose and position of a face and find the facial components. Second, the detected face is normalized to a standard size face. Then a set of geometrical facial features and three histograms in zones of mouth are extracted. Finally, by feeding these features to a classifier, the system detects if there is the neutral expressionless face or not.

189 citations


Patent
17 Oct 2002
TL;DR: In this article, a face imaging system for recordal and/or automated identity confirmation, including a camera unit and camera unit controller, is presented, which includes a video camera, a rotatable mirror system for directing images of the security area into the video camera and a ranging unit for detecting the presence of a target and for providing target range data, comprising distance, angle and width information, to the camera unit.
Abstract: A face imaging system for recordal and/or automated identity confirmation, including a camera unit and a camera unit controller. The camera unit includes a video camera, a rotatable mirror system for directing images of the security area into the video camera, and a ranging unit for detecting the presence of a target and for providing target range data, comprising distance, angle and width information, to the camera unit controller. The camera unit controller includes software for detecting face images of the target, tracking of detected face images, and capture of high quality face images. A communication system is provided for sending the captured face images to an external controller for face verification, face recognition and database searching. Face detection and face tracking is performed using the combination of video images and range data and the captured face images are recorded and/or made available for face recognition and searching.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: A set of seven metrics are proposed for quantifying different aspects of a detection algorithm's performance and will be used to evaluate algorithms for detecting text, faces, moving people and vehicles.
Abstract: The continuous development of object detection algorithms is ushering in the need for evaluation tools to quantify algorithm performance. In this paper a set of seven metrics are proposed for quantifying different aspects of a detection algorithm's performance. The strengths and weaknesses of these metrics are described. They are implemented in the Video Performance Evaluation Resource (ViPER) system and will be used to evaluate algorithms for detecting text, faces, moving people and vehicles. Results for running two previous text-detection algorithms on a common data set are presented.

Book ChapterDOI
TL;DR: A novel color texture-based method for object detection in images that produces robust and efficient LP detection as time-consuming color texture analyses for less relevant pixels are restricted, leaving only a small part of the input image to be analyzed.
Abstract: This paper presents a novel color texture-based method for object detection in images. To demonstrate our technique, a vehicle license plate (LP) localization system is developed. A support vector machine (SVM) is used to analyze the color textural properties of LPs. No external feature extraction module is used, rather the color values of the raw pixels that make up the color textural pattern are fed directly to the SVM, which works well even in high-dimensional spaces. Next, LP regions are identified by applying a continuously adaptive meanshift algorithm (CAMShift) to the results of the color texture analysis. The combination of CAMShift and SVMs produces not only robust and but also efficient LP detection as time-consuming color texture analyses for less relevant pixels are restricted, leaving only a small part of the input image to be analyzed.

Patent
Fabrice Lestideau1
31 May 2002
TL;DR: In this paper, a method for locating human faces, if present, in a cluttered scene captured on a digital image, relies on a two-step process, the first being the detection of segments with a high probability of being human skin in the color image, and the second step is the analysis of features within each of those boundary boxes to determine which of the segments are likely to be a human face.
Abstract: A method ( 100 ) of locating human faces, if present, in a cluttered scene captured on a digital image ( 105 ) is disclosed. The method ( 100 ) relies on a two step process, the first being the detection of segments with a high probability of being human skin in the color image ( 105 ), and to then determine a bounday box, or other boundary indication, to border each of those segments. The second step ( 140 ) is the analysis of features within each of those boundary boxes to determine which of the segments are likely to be a human face. As human skin is not highly textured, in order to detect segments with a high probability of being human skin, a binary texture map ( 121 ) is formed from the image ( 105 ), and segments having high texture are discarded.

Patent
21 Aug 2002
TL;DR: In this article, a robot includes a face extraction unit for extracting a feature of a face contained in an image picked up by a CCD camera and a face recognition unit for recognizing a face according to the face extraction result obtained by the face extract unit.
Abstract: A robot includes a face extraction unit for extracting a feature of a face contained in an image picked up by a CCD camera and a face recognition unit for recognizing a face according to the face extraction result obtained by the face extraction unit. The face extraction unit is composed of a Gabor filter for filtering an image by using a plurality of filters having direction selectivity and different frequency components. The face recognition unit is composed of a support vector machine for mapping the face extraction result onto a non-linear space and obtaining a hyperplane for separation in the space, thereby distinguishing face from non-face. The robot can recognize a user face in a dynamically changing environment within a predetermined time.

Proceedings ArticleDOI
20 May 2002
TL;DR: Experiments show that facial feature localization benefits significantly from the hierarchical approach, and results compare favorably with existing techniques for feature localization.
Abstract: We present a technique for facial feature localization using a two-level hierarchical wavelet network. The first level wavelet network is used for face matching, and yields an affine transformation used for a rough approximation of feature locations. Second level wavelet networks for each feature are then used to fine-tune the feature locations. Construction of a training database containing hierarchical wavelet networks of many faces allows features to be detected in most faces. Experiments show that facial feature localization benefits significantly from the hierarchical approach. Results compare favorably with existing techniques for feature localization.

Journal ArticleDOI
TL;DR: A new approach for estimating and tracking three-dimensional pose of a human face from the face images obtained from a single monocular view with full perspective projection, which is more robust than the existing feature-based approaches for face pose estimation.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: This work proposes a data-driven face analysis approach that is not only capable of extracting features relevant to a given face analysis task but is also robust with regard to face location changes and scale variations.
Abstract: Automatic face analysis has to cope with pose and lighting variations. Especially pose variations are difficult to tackle and many face analysis methods require the use of sophisticated normalization procedures. We propose a data-driven face analysis approach that is not only capable of extracting features relevant to a given face analysis task but is also robust with regard to face location changes and scale variations. This is achieved by deploying convolutional neural networks, which are either trained for facial expression recognition or face identity recognition. Combining the outputs of these networks allows us to obtain a subject dependent or personalized recognition of facial expressions.

Journal ArticleDOI
TL;DR: This work detects facial features and circumscribe each facial feature with the smallest rectangle possible by using vertical and horizontal gray value projections of pixels.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A set of orthogonal, binary, localized basis components are learned from a well-aligned face image database and leads to a Walsh function-based representation of the face images, which can be used to resolve the occlusion problem, improve the computing efficiency and compress the storage requirements of a face detection and recognition system.
Abstract: Proposes a novel method, called local non-negative matrix factorization (LNMF), for learning a spatially localized, parts-based subspace representation of visual patterns. An objective function is defined to impose the localization constraint, in addition to the non-negativity constraint in the standard non-negative matrix factorization (NMF). This gives a set of bases which not only allows a non-subtractive (part-based) representation of images but also manifests localized features. An algorithm is presented for the learning of such basis components. Experimental results are presented to compare LNMF with the NMF and principal component analysis (PCA) methods for face representation and recognition, which demonstrates the advantages of LNMF. Based on our LNMF approach, a set of orthogonal, binary, localized basis components are learned from a well-aligned face image database. It leads to a Walsh function-based representation of the face images. These properties can be used to resolve the occlusion problem, improve the computing efficiency and compress the storage requirements of a face detection and recognition system.

Journal ArticleDOI
TL;DR: This work extends SVMs to model the appearance of human faces which undergo non-linear change across multiple views and uses inherent factors in the nature of the input images and the SVM classification algorithm to perform both multi-view face detection and pose estimation.

Patent
14 Nov 2002
TL;DR: In this paper, an image processing system is disclosed that provides automatic face or skin blurring for images, where faces are determined in an image, and face matching is performed to match a particular face to faces in the image.
Abstract: An image processing system is disclosed that provides automatic face or skin blurring for images. All faces or skin can be blurred, or specific faces can be blurred. In one aspect of the invention, a particular face is blurred on an image or on a series of images in a video. Faces are determined in an image, and face matching is performed to match a particular face to faces in the image. If a match is found, the face or a portion of the face is blurred in the image. The blurring is performed on a portion of the image containing the particular face. Blurring may be performed through a variety of techniques. In another aspect of the invention, voice processing is used as an adjunct to or in place of face analysis to determine if a face in an image or series of images should be blurred. In another aspect of the invention, all faces or human skin in an image or series of images is blurred.

Proceedings ArticleDOI
03 Jul 2002
TL;DR: A methodology for creating an annotated database that employs a novel set of apparatus for the rapid capture of face images from a wide variety of pose angles and illumination angles and four different types of illumination are used.
Abstract: Face detection and recognition is becoming increasingly important in the contexts of surveillance,credit card fraud detection,assistive devices for visual impaired,etc. A number of face recognition algorithms have been proposed in the literature.The availability of a comprehensive face database is crucial to test the performance of these face recognition algorithms.However,while existing publicly-available face databases contain face images with a wide variety of poses angles, illumination angles,gestures,face occlusions,and illuminant colors, these images have not been adequately annotated,thus limiting their usefulness for evaluating the relative performance of face detection algorithms. For example,many of the images in existing databases are not annotated with the exact pose angles at which they were taken.In order to compare the performance of various face recognition algorithms presented in the literature there is a need for a comprehensive,systematically annotated database populated with face images that have been captured (1)at a variety of pose angles (to permit testing of pose invariance),(2)with a wide variety of illumination angles (to permit testing of illumination invariance),and (3)under a variety of commonly encountered illumination color temperatures (to permit testing of illumination color invariance). In this paper, we present a methodology for creating such an annotated database that employs a novel set of apparatus for the rapid capture of face images from a wide variety of pose angles and illumination angles. Four different types of illumination are used,including daylight,skylight,incandescent and fluorescent. The entire set of images,as well as the annotations and the experimental results,is being placed in the public domain,and made available for download over the worldwide web.© (2002) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: Experimental results show that a speed up ratio is achieved when applying a new approach to reduce the computation time taken by fast neural nets for the searching process to locate human faces automatically in cluttered scenes.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: An approach that explores the combined use of adaptive skin color segmentation and face detection for improved face tracking on a mobile robot to track faces that undergo changes in lighting conditions while at the same time providing information about the attention of the user.
Abstract: The visual tracking of human faces is a basic functionality needed for human-machine interfaces. This paper describes an approach that explores the combined use of adaptive skin color segmentation and face detection for improved face tracking on a mobile robot. To cope with inhomogeneous lighting within a single image, the color of each tracked image region is modeled with an individual, unimodal Gaussian. Face detection is performed locally on all segmented skin-colored regions. If a face is detected, the appropriate color model is updated with the image pixels in an elliptical area around the face position. Updating is restricted to pixels that are contained in a global skin color distribution obtained off-line. The presented method allows us to track faces that undergo changes in lighting conditions while at the same time providing information about the attention of the user, i.e. whether the user looks at the robot. This forms the basis for developing more sophisticated human-machine interfaces capable of dealing with unrestricted environments.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: A convolutional neural network architecture designed to recognize strongly variable face patterns directly from pixel images with no preprocessing, by automatically synthesizing its own set of feature extractors from a large training set of faces.
Abstract: In this paper, we present a connectionist approach for detecting and precisely localizing semi-frontal human faces in complex images, making no assumption about the content or the lighting conditions of the scene, or about the size or the appearance of the faces. We propose a convolutional neural network architecture designed to recognize strongly variable face patterns directly from pixel images with no preprocessing, by automatically synthesizing its own set of feature extractors from a large training set of faces. We present in details the optimized design of our architecture, our learning strategy and the resulting process of face detection. We also provide experimental results to demonstrate the robustness of our approach and its capability to precisely detect extremely variable faces in uncontrolled environments.

Book ChapterDOI
22 Nov 2002
TL;DR: This work uses qualitative photometric measurements to construct a face signature ('ratio-template') that is largely invariant to illumination changes and renders the representations stable in the presence of sensor noise and significant changes in object appearance.
Abstract: The success of any object recognition system, whether biological or artificial, lies in using appropriate representation schemes. The schemes should efficiently encode object concepts while being tolerant to appearance variations induced by changes in viewing geometry and illumination. Here, we present a biologically plausible representation scheme wherein objects are encoded as sets of qualitative image measurements. Our emphasis on the use of qualitative measurements renders the representations stable in the presence of sensor noise and significant changes in object appearance. We develop our ideas in the context of the task of face-detection under varying illumination. Our approach uses qualitative photometric measurements to construct a face signature ('ratio-template') that is largely invariant to illumination changes.

Proceedings ArticleDOI
20 May 2002
TL;DR: A procedure to adaptively change the CSM throughout the processing of a video, which works in environments where the face moves through multi-positioned light sources with varying types of illumination.
Abstract: There are many studies that use color space models (CSM) for detection of faces in an image. Most researchers a priori select a given CSM, and proceed to use the selected model for color segmentation of the face by constructing a color distribution model (CDM). There is limited work on finding the overall best CSM. We develop a procedure to adaptively change the CSM throughout the processing of a video. We show that this works in environments where the face moves through multi-positioned light sources with varying types of illumination. A test of the procedure using the 2D color space models; RG, rg, HS, YQ and CbCr found that switching between the color spaces resulted in increased tracking performance. In addition, we have proposed a new performance measure for evaluating color-tracking algorithms, which include both accuracy and robustness of the tracking window. The methodology developed can be used to find the optimal CSM-CDM combination in adaptive color tracking systems.

Proceedings ArticleDOI
20 May 2002
TL;DR: This paper shows theoretically and by experiments conducted with ordinary USB cameras that, by properly defining nose as an extremum of the 3D curvature of the nose surface, nose becomes the most robust feature which can be seen for almost any position of the head andWhich can be tracked very precisely, even with low resolution cameras.
Abstract: Human nose, while being in many cases the only facial feature clearly visible during the head motion, seems to be very undervalued in face tracking technology. This paper shows theoretically and by experiments conducted with ordinary USB cameras that, by properly defining nose as an extremum of the 3D curvature of the nose surface, nose becomes the most robust feature which can be seen for almost any position of the head and which can be tracked very precisely, even with low resolution cameras.

Proceedings ArticleDOI
20 May 2002
TL;DR: This work presents the first real-time multi-view face detection system which runs at 5 frames per second for 320/spl times/240 image sequence and trains by using a new meta booting learning algorithm.
Abstract: We present a detector-pyramid architecture for real-time multi-view face detection. Using a coarse to fine strategy, the full view is partitioned into finer and finer views. Each face detector in the pyramid detects faces of its respective view range. Its training is performed by using a new meta booting learning algorithm. This results in the first real-time multi-view face detection system which runs at 5 frames per second for 320/spl times/240 image sequence.