scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2008"


01 Oct 2008
TL;DR: The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
Abstract: Most face databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variables as position, pose, lighting, background, camera quality, and gender. While there are many applications for face recognition technology in which one can control the parameters of image acquisition, there are also many applications in which the practitioner has little or no control over such parameters. This database, Labeled Faces in the Wild, is provided as an aid in studying the latter, unconstrained, recognition problem. The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life. The database exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background. In addition to describing the details of the database, we provide specific experimental paradigms for which the database is suitable. This is done in an effort to make research performed with the database as consistent and comparable as possible. We provide baseline results, including results of a state of the art face recognition system combined with a face alignment system. To facilitate experimentation on the database, we provide several parallel databases, including an aligned version.

5,742 citations


Journal ArticleDOI
TL;DR: It is proposed that what makes face processing special is that it is gated by an obligatory detection process, and this idea is clarified in concrete algorithmic terms and shown how it can explain a variety of phenomena associated with face processing.
Abstract: Faces are among the most informative stimuli we ever perceive: Even a split-second glimpse of a person's face tells us his identity, sex, mood, age, race, and direction of attention. The specialness of face processing is acknowledged in the artificial vision community, where contests for face-recognition algorithms abound. Neurological evidence strongly implicates a dedicated machinery for face processing in the human brain to explain the double dissociability of face- and object-recognition deficits. Furthermore, recent evidence shows that macaques too have specialized neural machinery for processing faces. Here we propose a unifying hypothesis, deduced from computational, neurological, fMRI, and single-unit experiments: that what makes face processing special is that it is gated by an obligatory detection process. We clarify this idea in concrete algorithmic terms and show how it can explain a variety of phenomena associated with face processing.

545 citations


Proceedings ArticleDOI
Lijun Yin1, Xiaochen Chen1, Yi Sun1, T. Worm1, Michael Reale1 
01 Sep 2008
TL;DR: This paper presents a newly created high-resolution 3D dynamic facial expression database, which is made available to the scientific research community and has been validated through the authors' facial expression recognition experiment using an HMM based 3D spatio-temporal facial descriptor.
Abstract: Face information processing relies on the quality of data resource From the data modality point of view, a face database can be 2D or 3D, and static or dynamic From the task point of view, the data can be used for research of computer based automatic face recognition, face expression recognition, face detection, or cognitive and psychological investigation With the advancement of 3D imaging technologies, 3D dynamic facial sequences (called 4D data) have been used for face information analysis In this paper, we focus on the modality of 3D dynamic data for the task of facial expression recognition We present a newly created high-resolution 3D dynamic facial expression database, which is made available to the scientific research community The database contains 606 3D facial expression sequences captured from 101 subjects of various ethnic backgrounds The database has been validated through our facial expression recognition experiment using an HMM based 3D spatio-temporal facial descriptor It is expected that such a database shall be used to facilitate the facial expression analysis from a static 3D space to a dynamic 3D space, with a goal of scrutinizing facial behavior at a higher level of detail in a real 3D spatio-temporal domain

537 citations


Journal ArticleDOI
01 Aug 2008
TL;DR: A complete system for automatic face replacement in images that requires no 3D model, is fully automatic, and generates highly plausible results across a wide range of skin tones, lighting conditions, and viewpoints is presented.
Abstract: In this paper, we present a complete system for automatic face replacement in images. Our system uses a large library of face images created automatically by downloading images from the internet, extracting faces using face detection software, and aligning each extracted face to a common coordinate system. This library is constructed off-line, once, and can be efficiently accessed during face replacement. Our replacement algorithm has three main stages. First, given an input image, we detect all faces that are present, align them to the coordinate system used by our face library, and select candidate face images from our face library that are similar to the input face in appearance and pose. Second, we adjust the pose, lighting, and color of the candidate face images to match the appearance of those in the input image, and seamlessly blend in the results. Third, we rank the blended candidate replacements by computing a match distance over the overlap region. Our approach requires no 3D model, is fully automatic, and generates highly plausible results across a wide range of skin tones, lighting conditions, and viewpoints. We show how our approach can be used for a variety of applications including face de-identification and the creation of appealing group photographs from a set of images. We conclude with a user study that validates the high quality of our replacement results, and a discussion on the current limitations of our system.

390 citations


Journal ArticleDOI
TL;DR: One of the findings was that the automatic face alignment methods did not increase the gender classification rates, but manual alignment increased classification rates a little, which suggests that automatic alignment would be useful when the alignment methods are further improved.
Abstract: We present a systematic study on gender classification with automatically detected and aligned faces. We experimented with 120 combinations of automatic face detection, face alignment, and gender classification. One of the findings was that the automatic face alignment methods did not increase the gender classification rates. However, manual alignment increased classification rates a little, which suggests that automatic alignment would be useful when the alignment methods are further improved. We also found that the gender classification methods performed almost equally well with different input image sizes. In any case, the best classification rate was achieved with a support vector machine. A neural network and Adaboost achieved almost as good classification rates as the support vector machine and could be used in applications where classification speed is considered more important than the maximum classification accuracy.

370 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: Recognition of blurred faces using the recently introduced Local Phase Quantization (LPQ) operator is proposed and results show that the LPQ descriptor is highly tolerant to blur but still very descriptive outperforming LBP both with blurred and sharp images.
Abstract: In this paper, recognition of blurred faces using the recently introduced Local Phase Quantization (LPQ) operator is proposed. LPQ is based on quantizing the Fourier transform phase in local neighborhoods. The phase can be shown to be a blur invariant property under certain commonly fulfilled conditions. In face image analysis, histograms of LPQ labels computed within local regions are used as a face descriptor similarly to the widely used Local Binary Pattern (LBP) methodology for face image description. The experimental results on CMU PIE and FRGC 1.0.4 datasets show that the LPQ descriptor is highly tolerant to blur but still very descriptive outperforming LBP both with blurred and sharp images.

321 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A novel human detection system in personal albums based on LBP (local binary pattern) descriptor is developed and carefully designed experiments demonstrate the superiority of LBP over other traditional features for human detection.
Abstract: In recent years, local pattern based object detection and recognition have attracted increasing interest in computer vision research community. However, to our best knowledge no previous work has focused on utilizing local patterns for the task of human detection. In this paper we develop a novel human detection system in personal albums based on LBP (local binary pattern) descriptor. Firstly we review the existing gradient based local features widely used in human detection, analyze their limitations and argue that LBP is more discriminative. Secondly, original LBP descriptor does not suit the human detecting problem well due to its high complexity and lack of semantic consistency, thus we propose two variants of LBP: Semantic-LBP and Fourier-LBP. Carefully designed experiments demonstrate the superiority of LBP over other traditional features for human detection. Especially we adopt a random ensemble algorithm for better comparison between different descriptors. All experiments are conducted on INRIA human database.

319 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: This work proposes a new procedure for recognition of low-resolution faces, when there is a high-resolution training set available, and shows that recognition of faces of as low as 6 times 6 pixel size is considerably improved compared to matching using a super-resolution reconstruction followed by classification, and to matching with a low- resolution training set.
Abstract: Face recognition degrades when faces are of very low resolution since many details about the difference between one person and another can only be captured in images of sufficient resolution. In this work, we propose a new procedure for recognition of low-resolution faces, when there is a high-resolution training set available. Most previous super-resolution approaches are aimed at reconstruction, with recognition only as an after-thought. In contrast, in the proposed method, face features, as they would be extracted for a face recognition algorithm (e.g., eigenfaces, Fisher-faces, etc.), are included in a super-resolution method as prior information. This approach simultaneously provides measures of fit of the super-resolution result, from both reconstruction and recognition perspectives. This is different from the conventional paradigms of matching in a low-resolution domain, or, alternatively, applying a super-resolution algorithm to a low-resolution face and then classifying the super-resolution result. We show, for example, that recognition of faces of as low as 6 times 6 pixel size is considerably improved compared to matching using a super-resolution reconstruction followed by classification, and to matching with a low-resolution training set.

272 citations


Journal ArticleDOI
TL;DR: A linear asymmetric classifier (LAC) is presented, a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem and is demonstrated experimentally that LAC results in an improved ensemble classifier performance.
Abstract: A cascade face detector uses a sequence of node classifiers to distinguish faces from nonfaces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper: The first is a categorization of asymmetries in the learning goal and why they make face detection hard. The second is the forward feature selection (FFS) algorithm and a fast precomputing strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is a linear asymmetric classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem. We demonstrated experimentally that LAC results in an improved ensemble classifier performance.

256 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A real-time algorithm to estimate the 3D pose of a previously unseen face from a single range image is presented, based on a novel shape signature to identify noses in range images and a novel error function that compares the input range image to precomputed pose images of an average face model.
Abstract: We present a real-time algorithm to estimate the 3D pose of a previously unseen face from a single range image. Based on a novel shape signature to identify noses in range images, we generate candidates for their positions, and then generate and evaluate many pose hypotheses in parallel using modern graphics processing units (GPUs). We developed a novel error function that compares the input range image to precomputed pose images of an average face model. The algorithm is robust to large pose variations of plusmn90deg yaw, plusmn45deg pitch and plusmn30deg roll rotation, facial expression, partial occlusion, and works for multiple faces in the field of view. It correctly estimates 97.8% of the poses within yaw and pitch error of 15deg at 55.8 fps. To evaluate the algorithm, we built a database of range images with large pose variations and developed a method for automatic ground truth annotation.

209 citations


Patent
30 Dec 2008
TL;DR: In this article, the problem of automatically recognizing multiple known faces in photos or videos on a local computer storage device (on a home computer) was solved by automatically selecting thumbnail images of people.
Abstract: The present invention solves the problem of automatically recognizing multiple known faces in photos or videos on a local computer storage device (on a home computer). It further allows for sophisticated organization and presentation of the photos or videos based on the graphical selection of known faces (by selecting thumbnail images of people). It also solves the problem of sharing or distributing photos or videos in an automated fashion between 'friends' who are also using the same software that enables the invention. It further solves the problem of allowing a user of the invention to review the results of the automatic face detection, eye detection, and face recognition methods and to correct any errors resulting from the automated process.

Journal ArticleDOI
TL;DR: The main contributions are comprehensive and comparable classification results for the gender classification methods combined with automatic real-time face detection and, in addition, with manual face normalization, and two new variants of the known methods.

Journal ArticleDOI
TL;DR: A novel multiclassifier scheme is proposed to boost the recognition performance of human emotional state from audiovisual signals based on a comparative study of different classification algorithms and specific characteristics of individual emotion.
Abstract: Machine recognition of human emotional state is an important component for efficient human-computer interaction. The majority of existing works address this problem by utilizing audio signals alone, or visual information only. In this paper, we explore a systematic approach for recognition of human emotional state from audiovisual signals. The audio characteristics of emotional speech are represented by the extracted prosodic, Mel-frequency Cepstral Coefficient (MFCC), and formant frequency features. A face detection scheme based on HSV color model is used to detect the face from the background. The visual information is represented by Gabor wavelet features. We perform feature selection by using a stepwise method based on Mahalanobis distance. The selected audiovisual features are used to classify the data into their corresponding emotions. Based on a comparative study of different classification algorithms and specific characteristics of individual emotion, a novel multiclassifier scheme is proposed to boost the recognition performance. The feasibility of the proposed system is tested over a database that incorporates human subjects from different languages and cultural backgrounds. Experimental results demonstrate the effectiveness of the proposed system. The multiclassifier scheme achieves the best overall recognition rate of 82.14%.

Journal ArticleDOI
TL;DR: A novel method is proposed for making decisions about how many hypotheses to include in an ensemble and the appropriate balance of detection and false positive rates in the individual stages, which exploits the shape of the stage ROC curves in ways that have been previously ignored.
Abstract: Cascades of boosted ensembles have become popular in the object detection community following their highly successful introduction in the face detector of Viola and Jones. Since then, researchers have sought to improve upon the original approach by incorporating new methods along a variety of axes (e.g. alternative boosting methods, feature sets, etc.). Nevertheless, key decisions about how many hypotheses to include in an ensemble and the appropriate balance of detection and false positive rates in the individual stages are often made by user intervention or by an automatic method that produces unnecessarily slow detectors. We propose a novel method for making these decisions, which exploits the shape of the stage ROC curves in ways that have been previously ignored. The result is a detector that is significantly faster than the one produced by the standard automatic method. When this algorithm is combined with a recycling method for reusing the outputs of early stages in later ones and with a retracing method that inserts new early rejection points in the cascade, the detection speed matches that of the best hand-crafted detector. We also exploit joint distributions over several features in weak learning to improve overall detector accuracy, and explore ways to improve training time by aggressively filtering features.

Journal ArticleDOI
TL;DR: An elastically deformable model algorithm that establishes correspondence among a set of faces is proposed first and then bilinear models that decouple the identity and facial expression factors are constructed, enabling face recognition invariant to facial expressions and facialexpression recognition with unknown identity.
Abstract: In this paper, we explore bilinear models for jointly addressing 3D face and facial expression recognition. An elastically deformable model algorithm that establishes correspondence among a set of faces is proposed first and then bilinear models that decouple the identity and facial expression factors are constructed. Fitting these models to unknown faces enables us to perform face recognition invariant to facial expressions and facial expression recognition with unknown identity. A quantitative evaluation of the proposed technique is conducted on the publicly available BU-3DFE face database in comparison with our previous work on face recognition and other state-of-the-art algorithms for facial expression recognition. Experimental results demonstrate an overall 90.5% facial expression recognition rate and an 86% rank-1 face recognition rate.

Journal Article
TL;DR: A new database of color, high resolution face images acquired in partially controlled conditions and stored in 2048 × 1536 pixels images that can be used as a training and testing material in developing various algorithms related to the face detection, recognition and analysis.
Abstract: In this paper we present a new database of color, high resolution face images. The database contains almost 10000 images of 100 people acquired in partially controlled conditions and stored in 2048 × 1536 pixels images. The base is publicly available for research purposes and can be used as a training and testing material in developing various algorithms related to the face detection, recognition and analysis.

Patent
17 Jun 2008
TL;DR: In this paper, face detection is applied to at least a portion of the full resolution main image in a predicted location for candidate face regions having a predicted size as a function of the determined relative movement and the size and location of the one or more face regions within the reference images.
Abstract: A method of tracking a face in a reference image stream using a digital image acquisition device includes acquiring a full resolution main image and an image stream of relatively low resolution reference images each including one or more face regions. One or more face regions are identified within two or more of the reference images. A relative movement is determined between the two or more reference images. A size and location are determined of the one or more face regions within each of the two or more reference images. Concentrated face detection is applied to at least a portion of the full resolution main image in a predicted location for candidate face regions having a predicted size as a function of the determined relative movement and the size and location of the one or more face regions within the reference images, to provide a set of candidate face regions for the main image.

Book ChapterDOI
01 Dec 2008
TL;DR: A real-time liveness detection approach is presented against photograph spoofing in a non-intrusive manner for face recognition, which does not require any additional hardware except for a generic webcamera.
Abstract: Biometrics is an emerging technology that enables uniquely recognizing humans based upon one or more intrinsic physiological or behavioral characteristics, such as faces, fingerprints, irises, voices (Ross et al., 2006). However, spoofing attack (or copy attack) is still a fatal threat for biometric authentication systems (Schukers, 2002). Liveness detection, which aims at recognition of human physiological activities as the liveness indicator to prevent spoofing attack, is becoming a very active topic in field of fingerprint recognition and iris recognition (Schuckers, 2002; Bigun et al., 2004; Parthasaradhi et al., 2005; Antonelli et al., 2006). In face recognition community, although numerous recognition approaches have been presented, the effort on anti-spoofing is still very limited (Zhao et al., 2003). The most common faking way is to use a facial photograph of a valid user to spoof face recognition systems. Nowadays, video of a valid user can also be easily captured by needle camera for spoofing. Therefore anti-spoof problem should be well solved before face recognition could be widely applied in our life. Most of the current face recognition works with excellent performance, are based on intensity images and equipped with a generic camera. Thus, an anti-spoofing method without additional device will be preferable, since it could be easily integrated into the existing face recognition systems. In Section 2, we give a brief review of spoofing ways in face recognition and some related work. The potential clues will be also presented and commented. In Section 3, a real-time liveness detection approach is presented against photograph spoofing in a non-intrusive manner for face recognition, which does not require any additional hardware except for a generic webcamera. In Section 4, databases are introduced for eyeblink-based anti-spoofing. Section 5 presents an extensive set of experiments to show effectiveness of our approach. Discussions are in Section 6.

Journal ArticleDOI
TL;DR: The proposed approach is orientation invariant under varying lighting conditions and invariant to natural transformations such as translation, rotation, and scaling, which is effective for face detection and tracking.
Abstract: The constructive need for robots to coexist with humans requires human-machine interaction. It is a challenge to operate these robots in such dynamic environments, which requires continuous decision-making and environment-attribute update in real-time. An autonomous robot guide is well suitable in places such as museums, libraries, schools, hospital, etc. This paper addresses a scenario where a robot tracks and follows a human. A neural network is utilized to learn the skin and nonskin colors. The skin-color probability map is utilized for skin classification and morphology-based preprocessing. Heuristic rule is used for face-ratio analysis and Bayesian cost analysis for label classification. A face-detection module, based on a 2D color model in the and YUV color space, is selected over the traditional skin-color model in a 3D color space. A modified continuously adaptive mean shift tracking mechanism in a 1D hue, saturation, and value color space is developed and implemented onto the mobile robot. In addition to the visual cues, the tracking process considers 16 sonar scan and tactile sensor readings from the robot to generate a robust measure of the person's distance from the robot. The robot thus decides an appropriate action, namely, to follow the human subject and perform obstacle avoidance. The proposed approach is orientation invariant under varying lighting conditions and invariant to natural transformations such as translation, rotation, and scaling. Such a multimodal solution is effective for face detection and tracking.

Patent
07 Mar 2008
TL;DR: In this paper, a facial expression recognition system that uses a face detection apparatus realizing efficient learning and high-speed detection processing based on ensemble learning when detecting an area representing a detection target and that is robust against shifts of face position included in images and capable of highly accurate expression recognition, and a learning method for the system, are provided.
Abstract: A facial expression recognition system that uses a face detection apparatus realizing efficient learning and high-speed detection processing based on ensemble learning when detecting an area representing a detection target and that is robust against shifts of face position included in images and capable of highly accurate expression recognition, and a learning method for the system, are provided. When learning data to be used by the face detection apparatus by Adaboost, processing to select high-performance weak hypotheses from all weak hypotheses, then generate new weak hypotheses from these high-performance weak hypotheses on the basis of statistical characteristics, and select one weak hypothesis having the highest discrimination performance from these weak hypotheses, is repeated to sequentially generate a weak hypothesis, and a final hypothesis is thus acquired. In detection, using an abort threshold value that has been learned in advance, whether provided data can be obviously judged as a non-face is determined every time one weak hypothesis outputs the result of discrimination. If it can be judged so, processing is aborted. A predetermined Gabor filter is selected from the detected face image by an Adaboost technique, and a support vector for only a feature quantity extracted by the selected filter is learned, thus performing expression recognition.

Patent
06 Mar 2008
TL;DR: In this article, a system and methods for control of a personal computing device based on user face detection and recognition techniques is presented. But the system is not suitable for the use of mobile phones.
Abstract: Systems and methods are provided for control of a personal computing device based on user face detection and recognition techniques.

Journal ArticleDOI
TL;DR: A recursive error back-projection method to compensate for residual errors, and a region-based reconstruction method to preserve characteristics of local facial regions to create an extended morphable face model.
Abstract: This paper proposes a face hallucination method for the reconstruction of high-resolution facial images from single-frame, low-resolution facial images. The proposed method has been derived from example-based hallucination methods and morphable face models. First, we propose a recursive error back-projection method to compensate for residual errors, and a region-based reconstruction method to preserve characteristics of local facial regions. Then, we define an extended morphable face model, in which an extended face is composed of the interpolated high-resolution face from a given low-resolution face, and its original high-resolution equivalent. Then, the extended face is separated into an extended shape and an extended texture. We performed various hallucination experiments using the MPI, XM2VTS, and KF databases, compared the reconstruction errors, structural similarity index, and recognition rates, and showed the effects of face detection errors and shape estimation errors. The encouraging results demonstrate that the proposed methods can improve the performance of face recognition systems. Especially the proposed method can enhance the resolution of single-frame, low-resolution facial images.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A framework to optimize the discrimination-efficiency tradeoff in integrating multiple, heterogeneous features for object detection and shows that this approach outperforms the state-of-the-art methods.
Abstract: A large variety of image features has been invented for detection of objects of a known class. We propose a framework to optimize the discrimination-efficiency tradeoff in integrating multiple, heterogeneous features for object detection. Cascade structured detectors are learned by boosting local feature based weak classifiers. Each weak classifier corresponds to a local image region, from which several different types of features are extracted. The weak classifier makes predictions by examining the features one by one; this classifier goes to the next feature only when the prediction from the already examined features is not confident enough. The order in which the features are evaluated is determined based on their computational cost normalized classification powers. We apply our approach to two object classes, pedestrians and cars. The experimental results show that our approach outperforms the state-of-the-art methods.

Journal ArticleDOI
TL;DR: This paper proposes a fast and memory efficient method of live face detection for embedded face recognition system, based on the analysis of the movement of the eyes, which detects eyes in sequential input images and calculates variation of each eye region to determine whether the input face is a real face or not.
Abstract: To increase reliability of face recognition system, the system must be able to distinguish real face from a copy of face such as a photograph. In this paper, we propose a fast and memory efficient method of live face detection for embedded face recognition system, based on the analysis of the movement of the eyes. We detect eyes in sequential input images and calculate variation of each eye region to determine whether the input face is a real face or not. Experimental results show that the proposed approach is competitive and promising for live face detection. Keywords—Liveness Detection, Eye detection, SQI.

Proceedings ArticleDOI
01 Sep 2008
TL;DR: This work builds on the method of to create a prototype access control system, capable of handling variations in illumination and expression, as well as significant occlusion or disguise, and gaining a better understanding strengths and limitations of sparse representation as a tool for robust recognition.
Abstract: This work builds on the method of to create a prototype access control system, capable of handling variations in illumination and expression, as well as significant occlusion or disguise. Our demonstration will allow participants to interact with the algorithm, gaining a better understanding strengths and limitations of sparse representation as a tool for robust recognition.

Journal ArticleDOI
TL;DR: An automatic sketch synthesis algorithm is proposed based on embedded hidden Markov model (E-HMM) and selective ensemble strategy and achieves satisfactory effect of sketch synthesis with a small set of face training samples.
Abstract: Sketch synthesis plays an important role in face sketch-photo recognition system. In this manuscript, an automatic sketch synthesis algorithm is proposed based on embedded hidden Markov model (E-HMM) and selective ensemble strategy. First, the E-HMM is adopted to model the nonlinear relationship between a sketch and its corresponding photo. Then based on several learned models, a series of pseudo-sketches are generated for a given photo. Finally, these pseudo-sketches are fused together with selective ensemble strategy to synthesize a finer face pseudo-sketch. Experimental results illustrate that the proposed algorithm achieves satisfactory effect of sketch synthesis with a small set of face training samples.

Journal ArticleDOI
25 Jan 2008-Science
TL;DR: This work modeled human familiarity by using image averaging to derive stable face representations from naturally varying photographs, increasing the accuracy of an industry standard face-recognition algorithm from 54% to 100%, bringing the robust performance of a familiar human to an automated system.
Abstract: Accurate face recognition is critical for many security applications. Current automatic face-recognition systems are defeated by natural changes in lighting and pose, which often affect face images more profoundly than changes in identity. The only system that can reliably cope with such variability is a human observer who is familiar with the faces concerned. We modeled human familiarity by using image averaging to derive stable face representations from naturally varying photographs. This simple procedure increased the accuracy of an industry standard face-recognition algorithm from 54% to 100%, bringing the robust performance of a familiar human to an automated system.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A novel type of feature for fast and accurate face detection called Locally Assembled Binary (LAB) Haar feature, which is basically inspired by the success ofHaar feature and Local Binary Pattern for face detection, but it is far beyond a simple combination.
Abstract: In this paper, we describe a novel type of feature for fast and accurate face detection The feature is called Locally Assembled Binary (LAB) Haar feature LAB feature is basically inspired by the success of Haar feature and Local Binary Pattern (LBP) for face detection, but it is far beyond a simple combination In our method, Haar features are modified to keep only the ordinal relationship (named by binary Haar feature) rather than the difference between the accumulated intensities Several neighboring binary Haar features are then assembled to capture their co-occurrence with similar idea to LBP We show that the feature is more efficient than Haar feature and LBP both in discriminating power and computational cost Furthermore, a novel efficient detection method called feature-centric cascade is proposed to build an efficient detector, which is developed from the feature-centric method Experimental results on the CMU+MIT frontal face test set and CMU profile test set show that the proposed method can achieve very good results and amazing detection speed

Journal ArticleDOI
TL;DR: An object detection framework that learns the discriminative co-occurrence of multiple features that is a generalization of the framework proposed by Viola and Jones, where each weak classifier depends only on a single feature.
Abstract: This paper describes an object detection framework that learns the discriminative co-occurrence of multiple features. Feature co-occurrences are automatically found by sequential forward selection at each stage of the boosting process. The selected feature co-occurrences are capable of extracting structural similarities of target objects leading to better performance. The proposed method is a generalization of the framework proposed by Viola and Jones, where each weak classifier depends only on a single feature. Experimental results obtained using four object detectors for finding faces and three different hand poses, respectively, show that detectors trained with the proposed algorithm yield consistently higher detection rates than those based on their framework while using the same number of features.

Journal ArticleDOI
TL;DR: The results suggest that the ability to rapidly saccade to faces in natural scenes depends, at least in part, on low-level information contained in the Fourier 2-D amplitude spectrum.
Abstract: Recent results show that humans can respond with a saccadic eye movement toward faces much faster and with less error than toward other objects. What feature information does your visual cortex need to distinguish between different objects so rapidly? In a first step, we replicated the "fast saccadic bias" toward faces. We simultaneously presented one vehicle and one face image with different contrasts and asked our subjects to saccade as fast as possible to the image with higher contrast. This was considerably easier when the target was the face. In a second step, we scrambled both images to the same extent. For one subject group, we scrambled the orientations of wavelet components (local orientations) while preserving their location. This manipulation completely abolished the face bias for the fastest saccades. For a second group, we scrambled the phases (i.e., the location) of Fourier components while preserving their orientation (i.e., the 2-D amplitude spectrum). Even when no face was visible (100% scrambling), the fastest saccades were still strongly biased toward the scrambled face image! These results suggest that the ability to rapidly saccade to faces in natural scenes depends, at least in part, on low-level information contained in the Fourier 2-D amplitude spectrum.