scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2006"


Journal ArticleDOI
TL;DR: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features that is assessed in the face recognition problem under different challenges.
Abstract: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features. The face image is divided into several regions from which the LBP feature distributions are extracted and concatenated into an enhanced feature vector to be used as a face descriptor. The performance of the proposed method is assessed in the face recognition problem under different challenges. Other applications and several extensions are also discussed

5,563 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This work integrates the cascade-of-rejectors approach with the Histograms of Oriented Gradients features to achieve a fast and accurate human detection system that can process 5 to 30 frames per second depending on the density in which the image is scanned, while maintaining an accuracy level similar to existing methods.
Abstract: We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Using AdaBoost for feature selection, we identify the appropriate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the computation. For a 320 × 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.

1,626 citations


Journal ArticleDOI
01 Apr 2006
TL;DR: This paper presents a system for automatic recognition of facial action units (AUs) and their temporal models from long, profile-view face image sequences and introduces facial-action-dynamics recognition from continuous video input using temporal rules.
Abstract: Automatic analysis of human facial expression is a challenging problem with many applications. Most of the existing automated systems for facial expression analysis attempt to recognize a few prototypic emotional expressions, such as anger and happiness. Instead of representing another approach to machine analysis of prototypic facial expressions of emotion, the method presented in this paper attempts to handle a large range of human facial behavior by recognizing facial muscle actions that produce expressions. Virtually all of the existing vision systems for facial muscle action detection deal only with frontal-view face images and cannot handle temporal dynamics of facial actions. In this paper, we present a system for automatic recognition of facial action units (AUs) and their temporal models from long, profile-view face image sequences. We exploit particle filtering to track 15 facial points in an input face-profile sequence, and we introduce facial-action-dynamics recognition from continuous video input using temporal rules. The algorithm performs both automatic segmentation of an input video into facial expressions pictured and recognition of temporal segments (i.e., onset, apex, offset) of 27 AUs occurring alone or in a combination in the input face-profile video. A recognition rate of 87% is achieved.

604 citations


Journal ArticleDOI
TL;DR: The logarithmic total variation (LTV) model is presented, which has the ability to factorize a single face image and obtain the illumination invariant facial structure, which is then used for face recognition.
Abstract: In this paper, we present the logarithmic total variation (LTV) model for face recognition under varying illumination, including natural lighting conditions, where we rarely know the strength, direction, or number of light sources. The proposed LTV model has the ability to factorize a single face image and obtain the illumination invariant facial structure, which is then used for face recognition. Our model is inspired by the SQI model but has better edge-preserving ability and simpler parameter selection. The merit of this model is that neither does it require any lighting assumption nor does it need any training. The LTV model reaches very high recognition rates in the tests using both Yale and CMU PIE face databases as well as a face database containing 765 subjects under outdoor lighting conditions

468 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: Testing on 750 artificial and natural scenes shows that the model’s predictions are consistent with a large body of available literature on human psychophysics of visual search, suggesting that it may provide good approximation of how humans combine bottom-up and top-down cues.
Abstract: Integration of goal-driven, top-down attention and image-driven, bottom-up attention is crucial for visual search. Yet, previous research has mostly focused on models that are purely top-down or bottom-up. Here, we propose a new model that combines both. The bottom-up component computes the visual salience of scene locations in different feature maps extracted at multiple spatial scales. The topdown component uses accumulated statistical knowledge of the visual features of the desired search target and background clutter, to optimally tune the bottom-up maps such that target detection speed is maximized. Testing on 750 artificial and natural scenes shows that the model’s predictions are consistent with a large body of available literature on human psychophysics of visual search. These results suggest that our model may provide good approximation of how humans combine bottom-up and top-down cues such as to optimize target detection speed.

435 citations


Journal Article
TL;DR: A novel method for real-time, simultaneous multi-view face detection and facial pose estimation that employs a convolutional network to map face images to points on a manifold, parametrized by pose, and non-face images to Points far from that manifold is described.
Abstract: We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional network to map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose, and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets - one for frontal pose, one for rotated faces, and one for profiles - and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system's accuracy on both face detection and pose estimation is improved by training for the two tasks together.

403 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper investigates the application of the SIFT approach in the context of face authentication, and proposes and tests different matching schemes using the BANCA database and protocol, showing promising results.
Abstract: Several pattern recognition and classification techniques have been applied to the biometrics domain. Among them, an interesting technique is the Scale Invariant Feature Transform (SIFT), originally devised for object recognition. Even if SIFT features have emerged as a very powerful image descriptors, their employment in face analysis context has never been systematically investigated. This paper investigates the application of the SIFT approach in the context of face authentication. In order to determine the real potential and applicability of the method, different matching schemes are proposed and tested using the BANCA database and protocol, showing promising results.

386 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This work reports on the progress of building a system that enables fully automated fast and robust facial expression recognition from face video and analyses subtle changes in facial expression by recognizing facial muscle action units (AUs) and analysing their temporal behavior.
Abstract: In this work we report on the progress of building a system that enables fully automated fast and robust facial expression recognition from face video. We analyse subtle changes in facial expression by recognizing facial muscle action units (AUs) and analysing their temporal behavior. By detecting AUs from face video we enable the analysis of various facial communicative signals including facial expressions of emotion, attitude and mood. For an input video picturing a facial expression we detect per frame whether any of 15 different AUs is activated, whether that facial action is in the onset, apex, or offset phase, and what the total duration of the activation in question is. We base this process upon a set of spatio-temporal features calculated from tracking data for 20 facial fiducial points. To detect these 20 points of interest in the first frame of an input face video, we utilize a fully automatic, facial point localization method that uses individual feature GentleBoost templates built from Gabor wavelet features. Then, we exploit a particle filtering scheme that uses factorized likelihoods and a novel observation model that combines a rigid and a morphological model to track the facial points. The AUs displayed in the input video and their temporal segments are recognized finally by Support Vector Machines trained on a subset of most informative spatio-temporal features selected by AdaBoost. For Cohn-Kanade andMMI databases, the proposed system classifies 15 AUs occurring alone or in combination with other AUs with a mean agreement rate of 90.2% with human FACS coders.

364 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper proposes a novel approach to extract primitive 3D facial expression features, and then applies the feature distribution to classify the prototypic facial expressions, and demonstrates the advantages of the 3D geometric based approach over 2D texture based approaches in terms of various head poses.
Abstract: The creation of facial range models by 3D imaging systems has led to extensive work on 3D face recognition [19] However, little work has been done to study the usefulness of such data for recognizing and understanding facial expressions Psychological research shows that the shape of a human face, a highly mobile facial surface, is critical to facial expression perception In this paper, we investigate the importance and usefulness of 3D facial geometric shapes to represent and recognize facial expressions using 3D facial expression range data We propose a novel approach to extract primitive 3D facial expression features, and then apply the feature distribution to classify the prototypic facial expressions In order to validate our proposed approach, we have conducted experiments for person-independent facial expression recognition using our newly created 3D facial expression database We also demonstrate the advantages of our 3D geometric based approach over 2D texture based approaches in terms of various head poses

339 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper addresses the problem of detecting and segmenting partially occluded objects of a known category by defining a part labelling which densely covers the object and imposing asymmetric local spatial constraints on these labels to ensure the consistent layout of parts whilst allowing for object deformation.
Abstract: This paper addresses the problem of detecting and segmenting partially occluded objects of a known category. We first define a part labelling which densely covers the object. Our Layout Consistent Random Field (LayoutCRF) model then imposes asymmetric local spatial constraints on these labels to ensure the consistent layout of parts whilst allowing for object deformation. Arbitrary occlusions of the object are handled by avoiding the assumption that the whole object is visible. The resulting system is both efficient to train and to apply to novel images, due to a novel annealed layout-consistent expansion move algorithm paired with a randomised decision tree classifier. We apply our technique to images of cars and faces and demonstrate state-of-the-art detection and segmentation performance even in the presence of partial occlusion.

318 citations


Proceedings ArticleDOI
10 Apr 2006
TL;DR: In this paper, a user-independent fully automatic system for real-time recognition of facial actions from the Facial Action Coding System (FACS) was presented, which automatically detects frontal faces in the video stream and codes each frame with respect to 20 Action units.
Abstract: We present results on a user independent fully automatic system for real time recognition of facial actions from the Facial Action Coding System (FACS). The system automatically detects frontal faces in the video stream and codes each frame with respect to 20 Action units. We present preliminary results on a task of facial action detection in spontaneous expressions during discourse. Support vector machines and AdaBoost classifiers are compared. For both classifiers, the output margin predicts action unit intensity.

Journal ArticleDOI
TL;DR: This work presents an innovative method that combines a feature-based approach with a holistic one for three-dimensional (3D) face detection, which has been tested, with good results, on some 150 3D faces acquired by a laser range scanner.

Journal ArticleDOI
TL;DR: A spatio-temporal approach in recognizing six universal facial expressions from visual data and using them to compute levels of interest was presented and was found to be consistent with "ground truth" information in most of the cases.
Abstract: This paper presents a spatio-temporal approach in recognizing six universal facial expressions from visual data and using them to compute levels of interest. The classification approach relies on a two-step strategy on the top of projected facial motion vectors obtained from video sequences of facial expressions. First a linear classification bank was applied on projected optical flow vectors and decisions made by the linear classifiers were coalesced to produce a characteristic signature for each universal facial expression. The signatures thus computed from the training data set were used to train discrete hidden Markov models (HMMs) to learn the underlying model for each facial expression. The performances of the proposed facial expressions recognition were computed using five fold cross-validation on Cohn-Kanade facial expressions database consisting of 488 video sequences that includes 97 subjects. The proposed approach achieved an average recognition rate of 90.9% on Cohn-Kanade facial expressions database. Recognized facial expressions were mapped to levels of interest using the affect space and the intensity of motion around apex frame. Computed level of interest was subjectively analyzed and was found to be consistent with "ground truth" information in most of the cases. To further illustrate the efficacy of the proposed approach, and also to better understand the effects of a number of factors that are detrimental to the facial expression recognition, a number of experiments were conducted. The first empirical analysis was conducted on a database consisting of 108 facial expressions collected from TV broadcasts and labeled by human coders for subsequent analysis. The second experiment (emotion elicitation) was conducted on facial expressions obtained from 21 subjects by showing the subjects six different movies clips chosen in a manner to arouse spontaneous emotional reactions that would produce natural facial expressions.

Journal ArticleDOI
TL;DR: The proposed multistream HMM facial expression system, which utilizes stream reliability weights, achieves relative reduction of the facial expression recognition error of 44% compared to the single-stream HMM system.
Abstract: The performance of an automatic facial expression recognition system can be significantly improved by modeling the reliability of different streams of facial expression information utilizing multistream hidden Markov models (HMMs). In this paper, we present an automatic multistream HMM facial expression recognition system and analyze its performance. The proposed system utilizes facial animation parameters (FAPs), supported by the MPEG-4 standard, as features for facial expression classification. Specifically, the FAPs describing the movement of the outer-lip contours and eyebrows are used as observations. Experiments are first performed employing single-stream HMMs under several different scenarios, utilizing outer-lip and eyebrow FAPs individually and jointly. A multistream HMM approach is proposed for introducing facial expression and FAP group dependent stream reliability weights. The stream weights are determined based on the facial expression recognition results obtained when FAP streams are utilized individually. The proposed multistream HMM facial expression system, which utilizes stream reliability weights, achieves relative reduction of the facial expression recognition error of 44% compared to the single-stream HMM system.

Journal ArticleDOI
TL;DR: These ideas are demonstrated using a nearest-neighbor classifier on two 3D face databases: Florida State University and Notre Dame, highlighting a good recognition performance.
Abstract: We study shapes of facial surfaces for the purpose of face recognition. The main idea is to 1) represent surfaces by unions of level curves, called facial curves, of the depth function and 2) compare shapes of surfaces implicitly using shapes of facial curves. The latter is performed using a differential geometric approach that computes geodesic lengths between closed curves on a shape manifold. These ideas are demonstrated using a nearest-neighbor classifier on two 3D face databases: Florida State University and Notre Dame, highlighting a good recognition performance

Journal ArticleDOI
TL;DR: The results show that DP is a heterogeneous condition and that impairment in recognizing faces cannot be predicted by poor performance on any one measure of face processing.

Journal ArticleDOI
TL;DR: It can be shown experimentally that smoothing the face trajectories leads to a significant reduction of false detections compared to the static detector without the presented tracking extension, which is useful for improving speed and accuracy of the system.

Proceedings ArticleDOI
10 Apr 2006
TL;DR: Preliminary results of the face recognition grand challenge indicate that significant progress has been made towards achieving the stated goals.
Abstract: The goal of the Face Recognition Grand Challenge (FRGC) is to improve the performance of face recognition algorithms by an order of magnitude over the best results in Face Recognition Vendor Test (FRVT) 2002. The FRGC is designed to achieve this performance goal by presenting to researchers a six-experiment challenge problem along with a data corpus of 50,000 images. The data consists of 3D scans and high resolution still imagery taken under controlled and uncontrolled conditions. This paper presents preliminary results of the FRGC for all six experiments. The preliminary results indicate that significant progress has been made towards achieving the stated goals.

Journal ArticleDOI
TL;DR: Findings provide direct evidence that individual face discrimination in humans can take place as early as 130 ms following stimulus onset, during the same time window as face detection.
Abstract: How fast does the human visual system discriminate individual faces? To address this question, we used a continuous-stimulation paradigm in which event-related potentials (ERPs) to a face stimulus are recorded with respect to another face stimulus, rather than to a preceding blank-screen baseline epoch. Following the shift between two face stimuli, posterior sites showed an early negative ERP deflection that started at 130 ms and peaked at 160 ms, the latency of the N170, an ERP component associated with discriminating faces from objects. The ERP we recorded was larger in amplitude when the preceding stimulus was perceived as a different individual face rather than the same individual face, although face pairs were of equal physical distance in the two conditions. These findings provide direct evidence that individual face discrimination in humans can take place as early as 130 ms following stimulus onset, during the same time window as face detection.

Journal ArticleDOI
TL;DR: This review will focus on three brain regions, namely the STS for its role in processing gaze and facial movements, the FFA in face detection and identification and the amygdala in processing facial expressions of emotion, and examines the available literature on the normal development of face processing.

Proceedings ArticleDOI
10 Apr 2006
TL;DR: It is concluded that there are gender-specific differences in the appearance of facial expressions that can be exploited for automated recognition, and that cascades are an efficient and effective way of performing multi-class recognition of face expressions.
Abstract: This paper presents an approach to recognising the gender and expression of face images by means of active appearance models (AAM). Features extracted by a trained AAM are used to construct support vector machine (SVM) classifiers for 4 elementary emotional states (happy, angry, sad, neutral). These classifiers are arranged into a cascade structure in order to optimise overall recognition performance. Furthermore, it is shown how performance can be further improved by first classifying the gender of the face images using an SVM trained in a similar manner. Both gender-specific expression classification and expression-specific gender classification cascades are considered, with the former yielding better recognition performance. We conclude that there are gender-specific differences in the appearance of facial expressions that can be exploited for automated recognition, and that cascades are an efficient and effective way of performing multi-class recognition of facial expressions.

Journal ArticleDOI
TL;DR: A data-driven approach based on Markov chain Monte Carlo (DD-MCMC) is used, where component detection results generate state proposals for 3D pose estimation, and experimental results show that the method is able to estimate the human pose in static images of real scenes.
Abstract: Estimating human body poses in static images is important for many image understanding applications including semantic content extraction and image database query and retrieval. This problem is challenging due to the presence of clutter in the image, ambiguities in image observation, unknown human image boundary, and high-dimensional state space due to the complex articulated structure of the human body. Human pose estimation can be made more robust by integrating the detection of body components such as face and limbs, with the highly constrained structure of the articulated body. In this paper, a data-driven approach based on Markov chain Monte Carlo (DD-MCMC) is used, where component detection results generate state proposals for 3D pose estimation. To translate these observations into pose hypotheses, we introduce the use of "proposal maps," an efficient way of consolidating the evidence and generating 3D pose candidates during the MCMC search. Experimental results on a set of test images show that the method is able to estimate the human pose in static images of real scenes.

Proceedings ArticleDOI
10 Apr 2006
TL;DR: This paper presents an appearance-based strategy for head pose estimation using supervised graph embedding (GE) analysis, and achieves higher head Pose estimation accuracy with more efficient dimensionality reduction than the existing methods.
Abstract: Head pose is an important vision cue for scene interpretation and human computer interaction. To determine the head pose, one may consider the low-dimensional manifold structure of the face view points in image space. In this paper, we present an appearance-based strategy for head pose estimation using supervised graph embedding (GE) analysis. Thinking globally and fitting locally, we first construct the neighborhood weighted graph in the sense of supervised LLE. The unified projection is calculated in a closed-form solution based on the GE linearization. We then project new data (face view images) into the embedded low-dimensional subspace with the identical projection. The head pose is finally estimated by the K-nearest neighbor classification. We test the proposed method on 18,100 USF face view images. Experimental results show that, even using a very small training set (e.g. 10 subjects), GE achieves higher head pose estimation accuracy with more efficient dimensionality reduction than the existing methods.

Book ChapterDOI
07 May 2006
TL;DR: In this article, a face detection algorithm was proposed to detect faces in a collection of sensitive surveillance images, provided that the adversary does not learn the result of the face detection operation.
Abstract: Alice would like to detect faces in a collection of sensitive surveillance images she own. Bob has a face detection algorithm that he is willing to let Alice use, for a fee, as long as she learns nothing about his detector. Alice is willing to use Bob's detector provided that he will learn nothing about her images, not even the result of the face detection operation. Blind vision is about applying secure multi-party techniques to vision algorithms so that Bob will learn nothing about the images he operates on, not even the result of his own operation and Alice will learn nothing about the detector. The proliferation of surveillance cameras raises privacy concerns that can be addressed by secure multi-party techniques and their adaptation to vision algorithms.

Journal ArticleDOI
TL;DR: The models and methods developed have applications to person recognition and image indexing and a multidimensional representation of hair appearance is presented and computational algorithms are described.
Abstract: We develop computational models for measuring hair appearance for comparing different people. The models and methods developed have applications to person recognition and image indexing. An automatic hair detection algorithm is described and results reported. A multidimensional representation of hair appearance is presented and computational algorithms are described. Results on a data set of 524 subjects are reported. Identification of people using hair attributes is compared to eigenface-based recognition along with a joint, eigenface-hair-based identification

Proceedings ArticleDOI
17 Jun 2006
TL;DR: The experimental results show the importance of using appropriate feature sets and doing normalization on the feature vector and the effects of feature selection and feature normalization to the performance of a local appearance based face recognition scheme.
Abstract: In this paper, the effects of feature selection and feature normalization to the performance of a local appearance based face recognition scheme are presented. From the local features that are extracted using block-based discrete cosine transform, three feature sets are derived. These local feature vectors are normalized in two different ways; by making them unit norm and by dividing each coefficient to its standard deviation that is learned from the training set. The input test face images are then classified using four different distance measures: L1 norm, L2 norm, cosine angle and covariance between feature vectors. Extensive experiments have been conducted on the AR and CMU PIE face databases. The experimental results show the importance of using appropriate feature sets and doing normalization on the feature vector.

Proceedings ArticleDOI
01 Jan 2006
TL;DR: The results show that people can be re-detected in images where they do not face the camera, and two extensions improving the pictorial structure detections are described.
Abstract: The goal of this work is to find all occurrences of a particular person in a sequence of photographs taken over a short period of time. For identification, we assume each individual’s hair and clothing stays the same throughout the sequence. Even with these assumptions, the task remains challenging as people can move around, change their pose and scale, and partially occlude each other. We propose a two stage method. First, individuals are identified by clustering frontal face detections using color clothing information. Second, a color based pictorial structure model is used to find occurrences of each person in images where their frontal face detection was missed. Two extensions improving the pictorial structure detections are also described. In the first extension, we obtain a better clothing segmentation to improve the accuracy of the clothing color model. In the second extension, we simultaneously consider multiple detection hypotheses of all people potentially present in the shot. Our results show that people can be re-detected in images where they do not face the camera. Results are presented on several sequences from a personal photo collection.

01 Jan 2006
TL;DR: A novel skin colour model, RGB-H-CbCr for the detection of human faces is presented, able to achieve good detection success rates for near-frontal faces of varying orientations, skin colour and background environment.
Abstract: While the RGB, HSV and YUV (YCbCr) are standard models used in various colour imaging applications, not all of their information are necessary to classify skin colour. This paper presents a novel skin colour model, RGB-H-CbCr for the detection of human faces. Skin regions are extracted using a set of bounding rules based on the skin colour distribution obtained from a training set. The segmented face regions are further classified using a parallel combination of simple morphological operations. Experimental results on a large photo data set have demonstrated that the proposed model is able to achieve good detection success rates for near-frontal faces of varying orientations, skin colour and background environment. The results are also comparable to that of the AdaBoost face classifier.

Journal ArticleDOI
01 Aug 2006
TL;DR: A new technique for face detection and lip feature extraction that uses the contrast around the lip contour to extract the height and width of the mouth, metrics that are useful for speech filtering is proposed.
Abstract: This paper proposes a new technique for face detection and lip feature extraction. A real-time field-programmable gate array (FPGA) implementation of the two proposed techniques is also presented. Face detection is based on a naive Bayes classifier that classifies an edge-extracted representation of an image. Using edge representation significantly reduces the model's size to only 5184 B, which is 2417 times smaller than a comparable statistical modeling technique, while achieving an 86.6% correct detection rate under various lighting conditions. Lip feature extraction uses the contrast around the lip contour to extract the height and width of the mouth, metrics that are useful for speech filtering. The proposed FPGA system occupies only 15 050 logic cells, or about six times less than a current comparable FPGA face detection system

Proceedings ArticleDOI
20 Aug 2006
TL;DR: This paper presents an experimental study on automatic face gender classification by building a system that mainly consists of four parts, face detection, face alignment, texture normalization and gender classification.
Abstract: This paper presents an experimental study on automatic face gender classification by building a system that mainly consists of four parts, face detection, face alignment, texture normalization and gender classification Comparative study on the effects of different texture normalization methods including two kinds of affine mapping and one Delaunay triangulation based warping as preprocesses for gender classification by SVM, LDA and Real Adaboost respectively is reported through experiments on very large sets of snapshot images