scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2009"


Proceedings ArticleDOI
02 Sep 2009
TL;DR: This paper publishes a generative 3D shape and texture model, the Basel Face Model (BFM), and demonstrates its application to several face recognition task and publishes a set of detailed recognition and reconstruction results on standard databases to allow complete algorithm comparisons.
Abstract: Generative 3D face models are a powerful tool in computer vision. They provide pose and illumination invariance by modeling the space of 3D faces and the imaging process. The power of these models comes at the cost of an expensive and tedious construction process, which has led the community to focus on more easily constructed but less powerful models. With this paper we publish a generative 3D shape and texture model, the Basel Face Model (BFM), and demonstrate its application to several face recognition task. We improve on previous models by offering higher shape and texture accuracy due to a better scanning device and less correspondence artifacts due to an improved registration algorithm. The same 3D face model can be fit to 2D or 3D images acquired under different situations and with different sensors using an analysis by synthesis method. The resulting model parameters separate pose, lighting, imaging and identity parameters, which facilitates invariant face recognition across sensors and data sets by comparing only the identity parameters. We hope that the availability of this registered face model will spur research in generative models. Together with the model we publish a set of detailed recognition and reconstruction results on standard databases to allow complete algorithm comparisons.

1,265 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: A new dataset, H3D, is built of annotations of humans in 2D photographs with 3D joint information, inferred using anthropometric constraints, to address the classic problems of detection, segmentation and pose estimation of people in images with a novel definition of a part, a poselet.
Abstract: We address the classic problems of detection, segmentation and pose estimation of people in images with a novel definition of a part, a poselet. We postulate two criteria (1) It should be easy to find a poselet given an input image (2) it should be easy to localize the 3D configuration of the person conditioned on the detection of a poselet. To permit this we have built a new dataset, H3D, of annotations of humans in 2D photographs with 3D joint information, inferred using anthropometric constraints. This enables us to implement a data-driven search procedure for finding poselets that are tightly clustered in both 3D joint configuration space as well as 2D image appearance. The algorithm discovers poselets that correspond to frontal and profile faces, pedestrians, head and shoulder views, among others. Each poselet provides examples for training a linear SVM classifier which can then be run over the image in a multiscale scanning mode. The outputs of these poselet detectors can be thought of as an intermediate layer of nodes, on top of which one can run a second layer of classification or regression. We show how this permits detection and localization of torsos or keypoints such as left shoulder, nose, etc. Experimental results show that we obtain state of the art performance on people detection in the PASCAL VOC 2007 challenge, among other datasets. We are making publicly available both the H3D dataset as well as the poselet parameters for use by other researchers.

1,153 citations


Journal ArticleDOI
TL;DR: A discussion outlining the incentive for using face recognition, the applications of this technology, and some of the difficulties plaguing current systems with regard to this task has been provided.
Abstract: Face recognition presents a challenging problem in the field of image analysis and computer vision, and as such has received a great deal of attention over the last few years because of its many applications in various domains. Face recognition techniques can be broadly divided into three categories based on the face data acquisition methodology: methods that operate on intensity images; those that deal with video sequences; and those that require other sensory data such as 3D information or infra-red imagery. In this paper, an overview of some of the well-known methods in each of these categories is provided and some of the benefits and drawbacks of the schemes mentioned therein are examined. Furthermore, a discussion outlining the incentive for using face recognition, the applications of this technology, and some of the difficulties plaguing current systems with regard to this task has also been provided. This paper also mentions some of the most recent algorithms developed for this purpose and attempts to give an idea of the state of the art of face recognition technology.

751 citations


Journal ArticleDOI
TL;DR: The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources.
Abstract: Common benchmark data sets, standardized performance metrics, and baseline algorithms have demonstrated considerable impact on research and development in a variety of application domains. These resources provide both consumers and developers of technology with a common framework to objectively compare the performance of different algorithms and algorithmic improvements. In this paper, we present such a framework for evaluating object detection and tracking in video: specifically for face, text, and vehicle objects. This framework includes the source video data, ground-truth annotations (along with guidelines for annotation), performance metrics, evaluation protocols, and tools including scoring software and baseline algorithms. For each detection and tracking task and supported domain, we developed a 50-clip training set and a 50-clip test set. Each data clip is approximately 2.5 minutes long and has been completely spatially/temporally annotated at the I-frame level. Each task/domain, therefore, has an associated annotated corpus of approximately 450,000 frames. The scope of such annotation is unprecedented and was designed to begin to support the necessary quantities of data for robust machine learning approaches, as well as a statistically significant comparison of the performance of algorithms. The goal of this work was to systematically address the challenges of object detection and tracking through a common evaluation framework that permits a meaningful objective comparison of techniques, provides the research community with sufficient data for the exploration of automatic modeling techniques, encourages the incorporation of objective evaluation into the development process, and contributes useful lasting resources of a scale and magnitude that will prove to be extremely useful to the computer vision research community for years to come.

534 citations


Journal ArticleDOI
TL;DR: A critical survey of researches on image-based face recognition across pose is provided, classified into different categories according to their methodologies in handling pose variations, and several promising directions for future research have been suggested.

511 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper presents a unified framework for object detection, segmentation, and classification using regions using a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis.
Abstract: This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally; (2) they are only mildly affected by background clutter. Regions have not been popular as features due to their sensitivity to segmentation errors. In this paper, we start by producing a robust bag of overlaid regions for each image using Arbeldez et al., CVPR 2009. Each region is represented by a rich set of image cues (shape, color and texture). We then learn region weights using a max-margin framework. In detection and segmentation, we apply a generalized Hough voting scheme to generate hypotheses of object locations, scales and support, followed by a verification classifier and a constrained segmenter on each hypothesis. The proposed approach significantly outperforms the state of the art on the ETHZ shape database(87.1% average detection rate compared to Ferrari et al. 's 67.2%), and achieves competitive performance on the Caltech 101 database.

433 citations


Proceedings ArticleDOI
29 Sep 2009
TL;DR: The implementation exploits the inherent parallelism of ConvNets and takes full advantage of multiple hardware multiplyaccumulate units on the FPGA and can be used for low-power, lightweight embedded vision systems for micro-UAVs and other small robots.
Abstract: Convolutional Networks (ConvNets) are biologicallyinspired hierarchical architectures that can be trained to perform a variety of detection, recognition and segmentation tasks. ConvNets have a feed-forward architecture consisting of multiple linear convolution filters interspersed with pointwise non-linear squashing functions. This paper presents an efficient implementation of ConvNets on a low-end DSPoriented Field Programmable Gate Array (FPGA). The implementation exploits the inherent parallelism of ConvNets and takes full advantage of multiple hardware multiplyaccumulate units on the FPGA. The entire system uses a single FPGA with an external memory module, and no extra parts. A network compiler software was implemented, which takes a description of a trained ConvNet and compiles it into a sequence of instructions for the ConvNet Processor (CNP). A ConvNet face detection system was implemented and tested. Face detection on a 512 × 384 frame takes 100ms (10 frames per second), which corresponds to an average performance of 3.4×109 connections per second for this 340 million connection network. The design can be used for low-power, lightweight embedded vision systems for micro-UAVs and other small robots.

376 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper introduced contextual features that encapsulate the group structure locally (for each person in the group), and globally (the overall structure of the group) to accomplish a variety of tasks, such as demographic recognition, calculating scene and camera parameters, and even event recognition.
Abstract: In many social settings, images of groups of people are captured The structure of this group provides meaningful context for reasoning about individuals in the group, and about the structure of the scene as a whole For example, men are more likely to stand on the edge of an image than women Instead of treating each face independently from all others, we introduce contextual features that encapsulate the group structure locally (for each person in the group) and globally (the overall structure of the group) This “social context” allows us to accomplish a variety of tasks, such as such as demographic recognition, calculating scene and camera parameters, and even event recognition We perform human studies to show this context aids recognition of demographic information in images of strangers

339 citations


Proceedings ArticleDOI
11 Apr 2009
TL;DR: A new liveness detection method for face recognition based on differences in optical flow fields generated by movements of two-dimensional planes and three-dimensional objects is proposed.
Abstract: It is a common spoof to use a photograph to fool face recognition algorithm. In light of differences in optical flow fields generated by movements of two-dimensional planes and three-dimensional objects, we proposed a new liveness detection method for face recognition. Under the assumption that the test region is a two-dimensional plane, we can obtain a reference field from the actual optical flow field data. Then the degree of differences between the two fields can be used to distinguish between a three-dimensional face and a two-dimensional photograph. Empirical study shows that the proposed approach is both feasible and effective.

327 citations


Journal ArticleDOI
01 Feb 2009
TL;DR: A solution for human tracking with a mobile robot that implements multisensor data fusion techniques based on the recognition of typical leg patterns extracted from laser scans, showing that robust human tracking can be performed within complex indoor environments.
Abstract: One of fundamental issues for service robots is human-robot interaction. In order to perform such a task and provide the desired services, these robots need to detect and track people in the surroundings. In this paper, we propose a solution for human tracking with a mobile robot that implements multisensor data fusion techniques. The system utilizes a new algorithm for laser-based leg detection using the onboard laser range finder (LRF). The approach is based on the recognition of typical leg patterns extracted from laser scans, which are shown to also be very discriminative in cluttered environments. These patterns can be used to localize both static and walking persons, even when the robot moves. Furthermore, faces are detected using the robot's camera, and the information is fused to the legs' position using a sequential implementation of unscented Kalman filter. The proposed solution is feasible for service robots with a similar device configuration and has been successfully implemented on two different mobile platforms. Several experiments illustrate the effectiveness of our approach, showing that robust human tracking can be performed within complex indoor environments.

304 citations


Patent
04 Sep 2009
TL;DR: In this article, a processor-based system operating according to digitally-embedded programming instructions performs a method including identifying a group of pixels corresponding to a face region within digital image data acquired by an image acquisition device.
Abstract: A processor-based system operating according to digitally-embedded programming instructions performs a method including identifying a group of pixels corresponding to a face region within digital image data acquired by an image acquisition device. A set of face analysis parameter values is extracted from said face region, including a faceprint associated with the face region. First and second reference faceprints are determined for a person using reference images captured respectively in predetermined face-portrait conditions and using ambient conditions. The faceprints are analyzed to determine a baseline faceprint and a range of variability from the baseline associated with the person. Results of the analyzing are stored and used in subsequent recognition of the person in a subsequent image acquired under ambient conditions.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work presents a system that combines a standard sliding-window detector tuned for a high recall, low-precision operating point with a fast post-processing stage that is able to remove additional false positives by incorporating domain-specific information not available to the sliding- window detector.
Abstract: The last two years have witnessed the introduction and rapid expansion of products based upon large, systematically-gathered, street-level image collections, such as Google Street View, EveryScape, and Mapjack. In the process of gathering images of public spaces, these projects also capture license plates, faces, and other information considered sensitive from a privacy standpoint. In this work, we present a system that addresses the challenge of automatically detecting and blurring faces and license plates for the purpose of privacy protection in Google Street View. Though some in the field would claim face detection is “solved”, we show that state-of-the-art face detectors alone are not sufficient to achieve the recall desired for large-scale privacy protection. In this paper we present a system that combines a standard sliding-window detector tuned for a high recall, low-precision operating point with a fast post-processing stage that is able to remove additional false positives by incorporating domain-specific information not available to the sliding-window detector. Using a completely automatic system, we are able to sufficiently blur more than 89% of faces and 94 – 96% of license plates in evaluation sets sampled from Google Street View imagery.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A character specific multiple kernel classifier which is able to learn the features best able to discriminate between the characters is reported, demonstrating significantly increased coverage and performance with respect to previous methods on this material.
Abstract: We investigate the problem of automatically labelling faces of characters in TV or movie material with their names, using only weak supervision from automatically-aligned subtitle and script text. Our previous work (Everingham et al. [8]) demonstrated promising results on the task, but the coverage of the method (proportion of video labelled) and generalization was limited by a restriction to frontal faces and nearest neighbour classification. In this paper we build on that method, extending the coverage greatly by the detection and recognition of characters in profile views. In addition, we make the following contributions: (i) seamless tracking, integration and recognition of profile and frontal detections, and (ii) a character specific multiple kernel classifier which is able to learn the features best able to discriminate between the characters. We report results on seven episodes of the TV series "Buffy the Vampire Slayer", demonstrating significantly increased coverage and performance with respect to previous methods on this material.

Book ChapterDOI
20 Jul 2009
TL;DR: Facial expression recognition is a process performed by humans or computers that consists of analyzing the motion of facial features and/or the changes in the appearance of facial Features and classifying this information into some facialexpression-interpretative categories such as facial muscle activations.
Abstract: Facial expression recognition is a process performed by humans or computers, which consists of: 1. Locating faces in the scene (e.g., in an image; this step is also referred to as face detection), 2. Extracting facial features from the detected face region (e.g., detecting the shape of facial components or describing the texture of the skin in a facial area; this step is referred to as facial feature extraction), 3. Analyzing the motion of facial features and/or the changes in the appearance of facial features and classifying this information into some facialexpression-interpretative categories such as facial muscle activations like smile or frown, emotion (affect) categories like happiness or anger, attitude categories like (dis)liking or ambivalence, etc. (this step is also referred to as facial expression interpretation).

Journal ArticleDOI
01 Feb 2009
TL;DR: This paper focuses on affective face and body display, proposes a method to automatically detect their temporal segments or phases, explores whether the detection of the temporal phases can effectively support recognition of affective states, and recognizes Affective states based on phase synchronization/alignment.
Abstract: Psychologists have long explored mechanisms with which humans recognize other humans' affective states from modalities, such as voice and face display. This exploration has led to the identification of the main mechanisms, including the important role played in the recognition process by the modalities' dynamics. Constrained by the human physiology, the temporal evolution of a modality appears to be well approximated by a sequence of temporal segments called onset, apex, and offset. Stemming from these findings, computer scientists, over the past 15 years, have proposed various methodologies to automate the recognition process. We note, however, two main limitations to date. The first is that much of the past research has focused on affect recognition from single modalities. The second is that even the few multimodal systems have not paid sufficient attention to the modalities' dynamics: The automatic determination of their temporal segments, their synchronization to the purpose of modality fusion, and their role in affect recognition are yet to be adequately explored. To address this issue, this paper focuses on affective face and body display, proposes a method to automatically detect their temporal segments or phases, explores whether the detection of the temporal phases can effectively support recognition of affective states, and recognizes affective states based on phase synchronization/alignment. The experimental results obtained show the following: 1) affective face and body displays are simultaneous but not strictly synchronous; 2) explicit detection of the temporal phases can improve the accuracy of affect recognition; 3) recognition from fused face and body modalities performs better than that from the face or the body modality alone; and 4) synchronized feature-level fusion achieves better performance than decision-level fusion.

Patent
20 Jul 2009
TL;DR: In this article, a processor-based system operating according to digitally-embedded programming instructions includes a face detection module for identifying face regions within digital images, a normalization module generates a normalized version of the face region, and a face recognition module automatically extracts a set of face classifier parameter values from the normalized face region.
Abstract: A processor-based system operating according to digitally-embedded programming instructions includes a face detection module for identifying face regions within digital images. A normalization module generates a normalized version of the face region. A face recognition module automatically extracts a set of face classifier parameter values from the normalized face region that are referred to as a faceprint. A workflow module automatically compares the extracted faceprint to a database of archived faceprints previously determined to correspond to known identities. The workflow module determines based on the comparing whether the new faceprint corresponds to any of the known identities, and associates the new faceprint and normalized face region with a new or known identity within a database. A database module serves to archive data corresponding to the new faceprint and its associated parent image according to the associating by the workflow module within one or more digital data storage media.

Patent
Sergey Ioffe1, Lance Williams1, Dennis Strelow1, Andrea Frome1, Luc Vincent1 
31 Mar 2009
TL;DR: In this paper, a face detector is applied to detect a set of possible face regions in the image and an identity masker is used to process the detected face regions by identity masking techniques in order to obscure identities corresponding to the regions.
Abstract: A method and system of identity masking to obscure identities corresponding to face regions in an image is disclosed. A face detector is applied to detect a set of possible face regions in the image. Then an identity masker is used to process the detected face regions by identity masking techniques in order to obscure identities corresponding to the regions. For example, a detected face region can be blurred as if it is in motion by a motion blur algorithm, such that the blurred region can not be recognized as the original identity. Or the detected face region can be replaced by a substitute facial image by a face replacement algorithm to obscure the corresponding identity.

Journal ArticleDOI
TL;DR: This work shows empirically that facial identity information is conveyed largely via mechanisms tuned to horizontal visual structure, and shows that such structure affords computational advantages for face detection and decoding, including robustness to normal environmental image degradation.
Abstract: The structure of the human face allows it to signal a wide range of useful information about a person's gender, identity, mood, etc. We show empirically that facial identity information is conveyed largely via mechanisms tuned to horizontal visual structure. Specifically observers perform substantially better at identifying faces that have been filtered to contain just horizontal information compared to any other orientation band. We then show, computationally, that horizontal structures within faces have an unusual tendency to fall into vertically co-aligned clusters compared with images of natural scenes. We call these clusters "bar codes" and propose that they have important computational properties. We propose that it is this property makes faces "special" visual stimuli because they are able to transmit information as reliable spatial sequence: a highly constrained one-dimensional code. We show that such structure affords computational advantages for face detection and decoding, including robustness to normal environmental image degradation, but makes faces vulnerable to certain classes of transformation that change the sequence of bars such as spatial inversion or contrast-polarity reversal.

Patent
10 Jun 2009
TL;DR: In this paper, a method of automatically establishing the correct orientation of an image using facial information is proposed, which is based on the exploitation of the inherent property of image recognition algorithms in general and face detection in particular.
Abstract: A method of automatically establishing the correct orientation of an image using facial information. This method is based on the exploitation of the inherent property of image recognition algorithms in general and face detection in particular, where the recognition is based on criteria that is highly orientation sensitive. By applying a detection algorithm to images in various orientations, or alternatively by rotating the classifiers, and comparing the number of successful faces that are detected in each orientation, one may conclude as to the most likely correct orientation. Such method can be implemented as an automated method or a semi automatic method to guide users in viewing, capturing or printing of images.

Journal ArticleDOI
TL;DR: An accurate and robust framework for detecting and segmenting faces, localizing landmarks, and achieving fine registration of face meshes based on the fitting of a facial model based on a 3-D Point Distribution Model that is fitted without relying on texture, pose, or orientation information is presented.
Abstract: We present an accurate and robust framework for detecting and segmenting faces, localizing landmarks, and achieving fine registration of face meshes based on the fitting of a facial model. This model is based on a 3-D Point Distribution Model (PDM) that is fitted without relying on texture, pose, or orientation information. Fitting is initialized using candidate locations on the mesh, which are extracted from low-level curvature-based feature maps. Face detection is performed by classifying the transformations between model points and candidate vertices based on the upper-bound of the deviation of the parameters from the mean model. Landmark localization is performed on the segmented face by finding the transformation that minimizes the deviation of the model from the mean shape. Face registration is obtained using prior anthropometric knowledge and the localized landmarks. The performance of face detection is evaluated on a database of faces and non-face objects where we achieve an accuracy of 99.6%. We also demonstrate face detection and segmentation on objects with different scale and pose. The robustness of landmark localization is evaluated with noisy data and by varying the number of shapes and model points used in the model learning phase. Finally, face registration is compared with the traditional Iterative Closest Point (ICP) method and evaluated through a face retrieval and recognition framework on the GavabDB dataset, where we achieve a recognition rate of 87.4% and a retrieval rate of 83.9%.

Proceedings ArticleDOI
22 Feb 2009
TL;DR: The hardware design techniques including image scaling, integral image generation, pipelined processing as well as classifier, and parallel processing multiple classifiers to accelerate the processing speed of the face detection system are described.
Abstract: This paper presents a hardware architecture for face detection based system on AdaBoost algorithm using Haar features. We describe the hardware design techniques including image scaling, integral image generation, pipelined processing as well as classifier, and parallel processing multiple classifiers to accelerate the processing speed of the face detection system. Also we discuss the optimization of the proposed architecture which can be scalable for configurable devices with variable resources. The proposed architecture for face detection has been designed using Verilog HDL and implemented in Xilinx Virtex-5 FPGA. Its performance has been measured and compared with an equivalent software implementation. We show about 35 times increase of system performance over the equivalent software implementation.

Journal ArticleDOI
TL;DR: A graph matching method is utilized to build face-name association between a face affinity network and a name affinity network which are, respectively, derived from their own domains (video and script) and mined using social network analysis.
Abstract: Identification of characters in films, although very intuitive to humans, still poses a significant challenge to computer methods. In this paper, we investigate the problem of identifying characters in feature-length films using video and film script. Different from the state-of-the-art methods on naming faces in the videos, most of which used the local matching between a visible face and one of the names extracted from the temporally local video transcript, we attempt to do a global matching between names and clustered face tracks under the circumstances that there are not enough local name cues that can be found. The contributions of our work include: 1) A graph matching method is utilized to build face-name association between a face affinity network and a name affinity network which are, respectively, derived from their own domains (video and script). 2) An effective measure of face track distance is presented for face track clustering. 3) As an application, the relationship between characters is mined using social network analysis. The proposed framework is able to create a new experience on character-centered film browsing. Experiments are conducted on ten feature-length films and give encouraging results.

Patent
30 Jul 2009
TL;DR: In this article, a localized smoothing kernel is applied to luminance data corresponding to the sub-regions of the face image to generate an enhanced face image, which includes the original pixels in combination with pixels corresponding to one or more enhanced subregions.
Abstract: Sub-regions within a face image are identified to be enhanced by applying a localized smoothing kernel to luminance data corresponding to the sub-regions of the face image. An enhanced face image is generated including an enhanced version of the face that includes certain original pixels in combination with pixels corresponding to the one or more enhanced sub-regions of the face.

Journal ArticleDOI
TL;DR: A subregion-based framework that uses a Markov random field to model the statistical distribution and spatial coherence of face texture, which makes the approach not only robust to extreme lighting conditions, but also insensitive to partial occlusions.
Abstract: In this paper, we present a new method to modify the appearance of a face image by manipulating the illumination condition, when the face geometry and albedo information is unknown. This problem is particularly difficult when there is only a single image of the subject available. Recent research demonstrates that the set of images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated accurately by a low-dimensional linear subspace using a spherical harmonic representation. Moreover, morphable models are statistical ensembles of facial properties such as shape and texture. In this paper, we integrate spherical harmonics into the morphable model framework by proposing a 3D spherical harmonic basis morphable model (SHBMM). The proposed method can represent a face under arbitrary unknown lighting and pose simply by three low-dimensional vectors, i.e., shape parameters, spherical harmonic basis parameters, and illumination coefficients, which are called the SHBMM parameters. However, when the image was taken under an extreme lighting condition, the approximation error can be large, thus making it difficult to recover albedo information. In order to address this problem, we propose a subregion-based framework that uses a Markov random field to model the statistical distribution and spatial coherence of face texture, which makes our approach not only robust to extreme lighting conditions, but also insensitive to partial occlusions. The performance of our framework is demonstrated through various experimental results, including the improved rates for face recognition under extreme lighting conditions.

Patent
01 Apr 2009
TL;DR: In this article, a method and apparatus for creating and updating a facial image database from a collection of digital images is disclosed, where a set of detected faces from a digital image collection is stored in a database, along with data pertaining to them.
Abstract: A method and apparatus for creating and updating a facial image database from a collection of digital images is disclosed. A set of detected faces from a digital image collection is stored in a facial image database, along with data pertaining to them. At least one facial recognition template for each face in the first set is computed, and the images in the set are grouped according to the facial recognition template into similarity groups. Another embodiment is a naming tool for assigning names to a plurality of faces detected in a digital image collection. A facial image database stores data pertaining to facial images detected in images of a digital image collection. In addition, the naming tool may include a graphical user interface, a face detection module that detects faces in images of the digital image collection and stores data pertaining to the detected faces in the facial image database, a face recognition module that computes at least one facial recognition template for each facial image in the facial image database, and a similarity grouping module that groups facial images in the facial image database according to the respective templates such that similar facial images belong to one similarity group.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A novel method for synthesizing VIS images from NIR images based on learning the mappings between images of different spectra is proposed, which reduces the inter-spectral differences significantly, thus allowing effective matching between faces taken under different imaging conditions.
Abstract: This paper deals with a new problem in face recognition research, in which the enrollment and query face samples are captured under different lighting conditions. In our case, the enrollment samples are visual light (VIS) images, whereas the query samples are taken under near infrared (NIR) condition. It is very difficult to directly match the face samples captured under these two lighting conditions due to their different visual appearances. In this paper, we propose a novel method for synthesizing VIS images from NIR images based on learning the mappings between images of different spectra (i.e., NIR and VIS). In our approach, we reduce the inter-spectral differences significantly, thus allowing effective matching between faces taken under different imaging conditions. Face recognition experiments clearly show the efficacy of the proposed approach.

Patent
05 Jun 2009
Abstract: A method of tracking faces in an image stream with a digital image acquisition device includes receiving images from an image stream including faces, calculating corresponding integral images, and applying different subsets of face detection rectangles to the integral images to provide sets of candidate regions. The different subsets include candidate face regions of different sizes and/or locations within the images. The different candidate face regions from different images of the image stream are each tracked.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper develops a framework to measure the intensity of AU12 and AU6 in videos captured from infant-mother live face-to-face communications and shows significant agreement between a human FACS coder and the approach, which makes it an efficient approach for automated measurement of theintensity of non-posed facial action units.
Abstract: This paper presents a framework to automatically measure the intensity of naturally occurring facial actions. Naturalistic expressions are non-posed spontaneous actions. The facial action coding system (FACS) is the gold standard technique for describing facial expressions, which are parsed as comprehensive, nonoverlapping action units (Aus). AUs have intensities ranging from absent to maximal on a six-point metric (i.e., 0 to 5). Despite the efforts in recognizing the presence of non-posed action units, measuring their intensity has not been studied comprehensively. In this paper, we develop a framework to measure the intensity of AU12 (lip corner puller) and AU6 (cheek raising) in videos captured from infant-mother live face-to-face communications. The AU12 and AU6 are the most challenging case of infant's expressions (e.g., low facial texture in infant's face). One of the problems in facial image analysis is the large dimensionality of the visual data. Our approach for solving this problem is to utilize the spectral regression technique to project high dimensionality facial images into a low dimensionality space. Represented facial images in the low dimensional space are utilized to train support vector machine classifiers to predict the intensity of action units. Analysis of 18 minutes of captured video of non-posed facial expressions of several infants and mothers shows significant agreement between a human FACS coder and our approach, which makes it an efficient approach for automated measurement of the intensity of non-posed facial action units.

Journal ArticleDOI
TL;DR: A comparison with the geometry-free bag-of-words model shows that geometrical information provided by the framework improves classification, and a comparison with support vector machines demonstrates that Bayesian classification results in superior performance.
Abstract: This paper presents a novel framework for detecting, localizing, and classifying faces in terms of visual traits, e.g., sex or age, from arbitrary viewpoints and in the presence of occlusion. All three tasks are embedded in a general viewpoint-invariant model of object class appearance derived from local scale-invariant features, where features are probabilistically quantified in terms of their occurrence, appearance, geometry, and association with visual traits of interest. An appearance model is first learned for the object class, after which a Bayesian classifier is trained to identify the model features indicative of visual traits. The framework can be applied in realistic scenarios in the presence of viewpoint changes and partial occlusion, unlike other techniques assuming data that are single viewpoint, upright, prealigned, and cropped from background distraction. Experimentation establishes the first result for sex classification from arbitrary viewpoints, an equal error rate of 16.3 percent, based on the color FERET database. The method is also shown to work robustly on faces in cluttered imagery from the CMU profile database. A comparison with the geometry-free bag-of-words model shows that geometrical information provided by our framework improves classification. A comparison with support vector machines demonstrates that Bayesian classification results in superior performance.

Journal ArticleDOI
TL;DR: This paper proposes and study an approach for spatiotemporal face and gender recognition from videos using an extended set of volume LBP features and a boosting scheme, and assesses the promising performance of the LBP-based spatiotsemporal representations for describing and analyzing faces in videos.