scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2014"


Journal ArticleDOI
TL;DR: This paper presents a robust face alignment technique, which explicitly considers the uncertainties of facial feature detectors, and describes the dropout-support vector machine approach used by the system for face attribute estimation, in order to avoid over-fitting.
Abstract: This paper concerns the estimation of facial attributes—namely, age and gender—from images of faces acquired in challenging, in the wild conditions This problem has received far less attention than the related problem of face recognition, and in particular, has not enjoyed the same dramatic improvement in capabilities demonstrated by contemporary face recognition systems Here, we address this problem by making the following contributions First, in answer to one of the key problems of age estimation research—absence of data—we offer a unique data set of face images, labeled for age and gender, acquired by smart-phones and other mobile devices, and uploaded without manual filtering to online image repositories We show the images in our collection to be more challenging than those offered by other face-photo benchmarks Second, we describe the dropout-support vector machine approach used by our system for face attribute estimation, in order to avoid over-fitting This method, inspired by the dropout learning techniques now popular with deep belief networks, is applied here for training support vector machines, to the best of our knowledge, for the first time Finally, we present a robust face alignment technique, which explicitly considers the uncertainties of facial feature detectors We report extensive tests analyzing both the difficulty levels of contemporary benchmarks as well as the capabilities of our own system These show our method to outperform state-of-the-art by a wide margin

710 citations


Proceedings ArticleDOI
01 Oct 2014
TL;DR: An approach to building face datasets that starts with detecting faces in images returned from searches for public figures on the Internet, followed by discarding those not belonging to each queried person, and is releasing the FaceScrub dataset.
Abstract: Large face datasets are important for advancing face recognition research, but they are tedious to build, because a lot of work has to go into cleaning the huge amount of raw data. To facilitate this task, we describe an approach to building face datasets that starts with detecting faces in images returned from searches for public figures on the Internet, followed by discarding those not belonging to each queried person. We formulate the problem of identifying the faces to be removed as a quadratic programming problem, which exploits the observations that faces of the same person should look similar, have the same gender, and normally appear at most once per image. Our results show that this method can reliably clean a large dataset, leading to a considerable reduction in the work needed to build it. Finally, we are releasing the FaceScrub dataset that was created using this approach. It consists of 141,130 faces of 695 public figures and can be obtained from http://vintage.winklerbros.net/facescrub.html.

622 citations


Book ChapterDOI
06 Sep 2014
TL;DR: It is shown that a properly trained vanilla DPM reaches top performance, improving over commercial and research systems, and a detector based on rigid templates - similar in structure to the Viola&Jones detector - can reach similar top performance on this task.
Abstract: Face detection is a mature problem in computer vision. While diverse high performing face detectors have been proposed in the past, we present two surprising new top performance results. First, we show that a properly trained vanilla DPM reaches top performance, improving over commercial and research systems. Second, we show that a detector based on rigid templates - similar in structure to the Viola&Jones detector - can reach similar top performance on this task. Importantly, we discuss issues with existing evaluation benchmark and propose an improved procedure.

588 citations


Book ChapterDOI
06 Sep 2014
TL;DR: This paper proposes a Coarse-to-Fine Auto-encoder Networks (CFAN) approach, which cascades a few successive Stacked Auto- Encoding Networks (SANs) so that the first SAN predicts the landmarks quickly but accurately enough as a preliminary, by taking as input a low-resolution version of the detected face holistically.
Abstract: Accurate face alignment is a vital prerequisite step for most face perception tasks such as face recognition, facial expression analysis and non-realistic face re-rendering. It can be formulated as the nonlinear inference of the facial landmarks from the detected face region. Deep network seems a good choice to model the nonlinearity, but it is nontrivial to apply it directly. In this paper, instead of a straightforward application of deep network, we propose a Coarse-to-Fine Auto-encoder Networks (CFAN) approach, which cascades a few successive Stacked Auto-encoder Networks (SANs). Specifically, the first SAN predicts the landmarks quickly but accurately enough as a preliminary, by taking as input a low-resolution version of the detected face holistically. The following SANs then progressively refine the landmark by taking as input the local features extracted around the current landmarks (output of the previous SAN) with higher and higher resolution. Extensive experiments conducted on three challenging datasets demonstrate that our CFAN outperforms the state-of-the-art methods and performs in real-time(40+fps excluding face detection on a desktop).

548 citations


Book ChapterDOI
06 Sep 2014
TL;DR: The key idea is to combine face alignment with detection, observing that aligned face shapes provide better features for face classification and learns the two tasks jointly in the same cascade framework, by exploiting recent advances in face alignment.
Abstract: We present a new state-of-the-art approach for face detection. The key idea is to combine face alignment with detection, observing that aligned face shapes provide better features for face classification. To make this combination more effective, our approach learns the two tasks jointly in the same cascade framework, by exploiting recent advances in face alignment. Such joint learning greatly enhances the capability of cascade detection and still retains its realtime performance. Extensive experiments show that our approach achieves the best accuracy on challenging datasets, where all existing solutions are either inaccurate or too slow.

462 citations


Journal ArticleDOI
TL;DR: This paper comprehensively surveys the development of face hallucination, including both face super-resolution and face sketch-photo synthesis techniques, and presents a comparative analysis of representative methods and promising future directions.
Abstract: This paper comprehensively surveys the development of face hallucination (FH), including both face super-resolution and face sketch-photo synthesis techniques. Indeed, these two techniques share the same objective of inferring a target face image (e.g. high-resolution face image, face sketch and face photo) from a corresponding source input (e.g. low-resolution face image, face photo and face sketch). Considering the critical role of image interpretation in modern intelligent systems for authentication, surveillance, law enforcement, security control, and entertainment, FH has attracted growing attention in recent years. Existing FH methods can be grouped into four categories: Bayesian inference approaches, subspace learning approaches, a combination of Bayesian inference and subspace learning approaches, and sparse representation-based approaches. In spite of achieving a certain level of development, FH is limited in its success by complex application conditions such as variant illuminations, poses, or views. This paper provides a holistic understanding and deep insight into FH, and presents a comparative analysis of representative methods and promising future directions.

365 citations


Proceedings ArticleDOI
TL;DR: In this paper, a multi-view face detector using aggregate channel features is proposed, which extends the image channel to diverse types like gradient magnitude and oriented gradient histograms and therefore encodes rich information in a simple form.
Abstract: Face detection has drawn much attention in recent decades since the seminal work by Viola and Jones. While many subsequences have improved the work with more powerful learning algorithms, the feature representation used for face detection still can’t meet the demand for effectively and efficiently handling faces with large appearance variance in the wild. To solve this bottleneck, we borrow the concept of channel features to the face detection domain, which extends the image channel to diverse types like gradient magnitude and oriented gradient histograms and therefore encodes rich information in a simple form. We adopt a novel variant called aggregate channel features, make a full exploration of feature design, and discover a multiscale version of features with better performance. To deal with poses of faces in the wild, we propose a multi-view detection approach featuring score re-ranking and detection adjustment. Following the learning pipelines in ViolaJones framework, the multi-view face detector using aggregate channel features surpasses current state-of-the-art detectors on AFW and FDDB testsets, while runs at 42 FPS

288 citations


Journal ArticleDOI
TL;DR: A complete algorithmic description, a learning code and a learned face detector that can be applied to any color image are proposed and a post-processing step is proposed to reduce detection redundancy using a robustness argument.
Abstract: In this article, we decipher the Viola-Jones algorithm, the first ever real-time face detection system. There are three ingredients working in concert to enable a fast and accurate detection: the integral image for feature computation, Adaboost for feature selection and an attentional cascade for efficient computational resource allocation. Here we propose a complete algorithmic description, a learning code and a learned face detector that can be applied to any color image. Since the Viola-Jones algorithm typically gives multiple detections, a post-processing step is also proposed to reduce detection redundancy using a robustness argument. Source Code The source code and the online demo are accessible at the IPOL web page of this article 1 .

259 citations


Journal ArticleDOI
28 Jul 2014
TL;DR: This paper presents the first publicly available face database based on the Kinect sensor, and conducts benchmark evaluations on the proposed database using standard face recognition methods, and demonstrates the gain in performance when integrating the depth data with the RGB data via score-level fusion.
Abstract: The recent success of emerging RGB-D cameras such as the Kinect sensor depicts a broad prospect of 3-D data-based computer applications. However, due to the lack of a standard testing database, it is difficult to evaluate how the face recognition technology can benefit from this up-to-date imaging sensor. In order to establish the connection between the Kinect and face recognition research, in this paper, we present the first publicly available face database (i.e., KinectFaceDB 1 ) based on the Kinect sensor. The database consists of different data modalities (well-aligned and processed 2-D, 2.5-D, 3-D, and video-based face data) and multiple facial variations. We conducted benchmark evaluations on the proposed database using standard face recognition methods, and demonstrated the gain in performance when integrating the depth data with the RGB data via score-level fusion. We also compared the 3-D images of Kinect (from the KinectFaceDB) with the traditional high-quality 3-D scans (from the FRGC database) in the context of face biometrics, which reveals the imperative needs of the proposed database for face recognition research. 1 Online at http://rgb-d.eurecom.fr

257 citations


Journal ArticleDOI
TL;DR: This paper inspects the spoofing potential of subject-specific 3D facial masks for different recognition systems and addresses the detection problem of this more complex attack type.
Abstract: Spoofing is the act of masquerading as a valid user by falsifying data to gain an illegitimate access. Vulnerability of recognition systems to spoofing attacks (presentation attacks) is still an open security issue in biometrics domain and among all biometric traits, face is exposed to the most serious threat, since it is particularly easy to access and reproduce. In this paper, many different types of face spoofing attacks have been examined and various algorithms have been proposed to detect them. Mainly focusing on 2D attacks forged by displaying printed photos or replaying recorded videos on mobile devices, a significant portion of these studies ground their arguments on the flatness of the spoofing material in front of the sensor. However, with the advancements in 3D reconstruction and printing technologies, this assumption can no longer be maintained. In this paper, we aim to inspect the spoofing potential of subject-specific 3D facial masks for different recognition systems and address the detection problem of this more complex attack type. In order to assess the spoofing performance of 3D masks against 2D, 2.5D, and 3D face recognition and to analyze various texture-based countermeasures using both 2D and 2.5D data, a parallel study with comprehensive experiments is performed on two data sets: the Morpho database which is not publicly available and the newly distributed 3D mask attack database.

249 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed approach boosts the likelihood of correctly identifying the person of interest through the use of different fusion schemes, 3-D face models, and incorporation of quality measures for fusion and video frame selection.
Abstract: As face recognition applications progress from constrained sensing and cooperative subjects scenarios (e.g., driver’s license and passport photos) to unconstrained scenarios with uncooperative subjects (e.g., video surveillance), new challenges are encountered. These challenges are due to variations in ambient illumination, image resolution, background clutter, facial pose, expression, and occlusion. In forensic investigations where the goal is to identify a person of interest, often based on low quality face images and videos, we need to utilize whatever source of information is available about the person. This could include one or more video tracks, multiple still images captured by bystanders (using, for example, their mobile phones), 3-D face models constructed from image(s) and video(s), and verbal descriptions of the subject provided by witnesses. These verbal descriptions can be used to generate a face sketch and provide ancillary information about the person of interest (e.g., gender, race, and age). While traditional face matching methods generally take a single media (i.e., a still face image, video track, or face sketch) as input, this paper considers using the entire gamut of media as a probe to generate a single candidate list for the person of interest. We show that the proposed approach boosts the likelihood of correctly identifying the person of interest through the use of different fusion schemes, 3-D face models, and incorporation of quality measures for fusion and video frame selection.

Journal ArticleDOI
TL;DR: The results show that the approach to detect face spoofing using the spatiotemporal extensions of the highly popular local binary pattern operator performs better than state-of-the-art techniques following the provided evaluation protocols of each database.
Abstract: User authentication is an important step to protect information, and in this context, face biometrics is potentially advantageous. Face biometrics is natural, intuitive, easy to use, and less human-invasive. Unfortunately, recent work has revealed that face biometrics is vulnerable to spoofing attacks using cheap low-tech equipment. This paper introduces a novel and appealing approach to detect face spoofing using the spatiotemporal (dynamic texture) extensions of the highly popular local binary pattern operator. The key idea of the approach is to learn and detect the structure and the dynamics of the facial micro-textures that characterise real faces but not fake ones. We evaluated the approach with two publicly available databases (Replay-Attack Database and CASIA Face Anti-Spoofing Database). The results show that our approach performs better than state-of-the-art techniques following the provided evaluation protocols of each database.

Proceedings ArticleDOI
Cha Zhang1, Zhengyou Zhang1
24 Mar 2014
TL;DR: A deep convolutional neural network is built that can simultaneously learn the face/nonface decision, the face pose estimation problem, and the facial landmark localization problem and it is shown that such a multi-task learning scheme can further improve the classifier's accuracy.
Abstract: Multiview face detection is a challenging problem due to dramatic appearance changes under various pose, illumination and expression conditions. In this paper, we present a multi-task deep learning scheme to enhance the detection performance. More specifically, we build a deep convolutional neural network that can simultaneously learn the face/nonface decision, the face pose estimation problem, and the facial landmark localization problem. We show that such a multi-task learning scheme can further improve the classifier's accuracy. On the challenging FDDB data set, our detector achieves over 3% improvement in detection rate at the same false positive rate compared with other state-of-the-art methods.

Posted Content
TL;DR: Following the learning pipelines in Viola-Jones framework, the multi-view face detector using aggregate channel features shows competitive performance against state-of-the-art algorithms on AFW and FDDB test-sets, while runs at 42 FPS on VGA images.
Abstract: Face detection has drawn much attention in recent decades since the seminal work by Viola and Jones. While many subsequences have improved the work with more powerful learning algorithms, the feature representation used for face detection still can't meet the demand for effectively and efficiently handling faces with large appearance variance in the wild. To solve this bottleneck, we borrow the concept of channel features to the face detection domain, which extends the image channel to diverse types like gradient magnitude and oriented gradient histograms and therefore encodes rich information in a simple form. We adopt a novel variant called aggregate channel features, make a full exploration of feature design, and discover a multi-scale version of features with better performance. To deal with poses of faces in the wild, we propose a multi-view detection approach featuring score re-ranking and detection adjustment. Following the learning pipelines in Viola-Jones framework, the multi-view face detector using aggregate channel features shows competitive performance against state-of-the-art algorithms on AFW and FDDB testsets, while runs at 42 FPS on VGA images.

Journal ArticleDOI
TL;DR: The co-occurrence between face and body helps to handle large variations, such as heavy occlusions, to further boost the face detection performance, and the hierarchical part based structural model is proposed to explicitly capture them.

Journal ArticleDOI
TL;DR: This paper reduces the uncertainty of the face representation by synthesizing the virtual training samples and devise a representation approach based on the selected useful training samples to perform face recognition that can not only obtain a high face recognition accuracy, but also has a lower computational complexity than the other state-of-the-art approaches.
Abstract: The image of a face varies with the illumination, pose, and facial expression, thus we say that a single face image is of high uncertainty for representing the face In this sense, a face image is just an observation and it should not be considered as the absolutely accurate representation of the face As more face images from the same person provide more observations of the face, more face images may be useful for reducing the uncertainty of the representation of the face and improving the accuracy of face recognition However, in a real world face recognition system, a subject usually has only a limited number of available face images and thus there is high uncertainty In this paper, we attempt to improve the face recognition accuracy by reducing the uncertainty First, we reduce the uncertainty of the face representation by synthesizing the virtual training samples Then, we select useful training samples that are similar to the test sample from the set of all the original and synthesized virtual training samples Moreover, we state a theorem that determines the upper bound of the number of useful training samples Finally, we devise a representation approach based on the selected useful training samples to perform face recognition Experimental results on five widely used face databases demonstrate that our proposed approach can not only obtain a high face recognition accuracy, but also has a lower computational complexity than the other state-of-the-art approaches

Proceedings ArticleDOI
02 Oct 2014
TL;DR: The Support Vector Machine is one of the most efficient machine learning algorithms, which is mostly used for pattern recognition since its introduction in 1990s, and statistics was collected from journals and electronic sources published in the period of 2000 to 2013.
Abstract: Support Vector Machine(SVM)is one of the most efficient machine learning algorithms, which is mostly used for pattern recognition since its introduction in 1990s. SVMs vast variety of usage, such as face and speech recognition, face detection and image recognition has turned it into a very useful algorithm. This has also been applied to many pattern classification problems such as image recognition, speech recognition, text categorization, face detection, and faulty card detection.Statistics was collected from journals and electronic sources published in the period of 2000 to 2013. Pattern recognition aims to classify data based on either a priori knowledge or statistical information extracted from raw data, which is a powerful tool in data separation in many disciplines. The Support Vector Machine (SVM) is a kind of algorithms in biometrics. It is a statistics technical and used orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables.

Journal ArticleDOI
TL;DR: Experimental results based on the Southampton multibiometric tunnel database show that the use of soft biometric traits is able to improve the performance of face recognition based on sparse representation on real and ideal scenarios by adaptive fusion rules.
Abstract: Soft biometric information extracted from a human body (e.g., height, gender, skin color, hair color, and so on) is ancillary information easily distinguished at a distance but it is not fully distinctive by itself in recognition tasks. However, this soft information can be explicitly fused with biometric recognition systems to improve the overall recognition when confronting high variability conditions. One significant example is visual surveillance, where face images are usually captured in poor quality conditions with high variability and automatic face recognition systems do not work properly. In this scenario, the soft biometric information can provide very valuable information for person recognition. This paper presents an experimental study of the benefits of soft biometric labels as ancillary information based on the description of human physical features to improve challenging person recognition scenarios at a distance. In addition, we analyze the available soft biometric information in scenarios of varying distance between camera and subject. Experimental results based on the Southampton multibiometric tunnel database show that the use of soft biometric traits is able to improve the performance of face recognition based on sparse representation on real and ideal scenarios by adaptive fusion rules.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work proposes a novel face track descriptor, based on the Fisher Vector representation, and demonstrates that it has a number of favourable properties, including compact size and fast computation, which render it very suitable for large scale visual repositories.
Abstract: Our goal is to learn a compact, discriminative vector representation of a face track, suitable for the face recognition tasks of verification and classification. To this end, we propose a novel face track descriptor, based on the Fisher Vector representation, and demonstrate that it has a number of favourable properties. First, the descriptor is suitable for tracks of both frontal and profile faces, and is insensitive to their pose. Second, the descriptor is compact due to discriminative dimensionality reduction, and it can be further compressed using binarization. Third, the descriptor can be computed quickly (using hard quantization) and its compact size and fast computation render it very suitable for large scale visual repositories. Finally, the descriptor demonstrates good generalization when trained on one dataset and tested on another, reflecting its tolerance to the dataset bias. In the experiments we show that the descriptor exceeds the state of the art on both face verification task (YouTube Faces without outside training data, and INRIA-Buffy benchmarks), and face classification task (using the Oxford-Buffy dataset).

Journal ArticleDOI
TL;DR: The study improves expression recognition rate and execution time of facial expression recognition system using AdaBoost-based hypothesis and various techniques were utilized to achieve this.
Abstract: The study improves expression recognition rate and execution time.Average recognition rates in JAFFE and Yale databases are 96.83% and 92.22%, respectively.The execution time for processing 100i?100 pixel size is 14.5ms.Best recognitions are happy, surprise, and disgust and the poorest is neutral.The general results are very encouraging when compared with others. This study improves the recognition accuracy and execution time of facial expression recognition system. Various techniques were utilized to achieve this. The face detection component is implemented by the adoption of Viola-Jones descriptor. The detected face is down-sampled by Bessel transform to reduce the feature extraction space to improve processing time then. Gabor feature extraction techniques were employed to extract thousands of facial features which represent various facial deformation patterns. An AdaBoost-based hypothesis is formulated to select a few hundreds of the numerous extracted features to speed up classification. The selected features were fed into a well designed 3-layer neural network classifier that is trained by a back-propagation algorithm. The system is trained and tested with datasets from JAFFE and Yale facial expression databases. An average recognition rate of 96.83% and 92.22% are registered in JAFFE and Yale databases, respectively. The execution time for a 100i?100 pixel size is 14.5ms. The general results of the proposed techniques are very encouraging when compared with others.

Book ChapterDOI
01 Nov 2014
TL;DR: The Eigen-PEP model is presented, built upon the recent success of the probabilistic elastic part (PEP) model, which produces an intermediate high dimensional, part-based, and pose-invariant representation of a face subject.
Abstract: To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible representation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It shall also be flexible to be incrementally updated, incorporating new or retiring obsolete observations. In search for such a representation, we present the Eigen-PEP that is built upon the recent success of the probabilistic elastic part (PEP) model. It first integrates the information from relevant video sources by a part-based average pooling through the PEP model, which produces an intermediate high dimensional, part-based, and pose-invariant representation. We then compress the intermediate representation through principal component analysis, and only a number of principal eigen dimensions are kept (as small as 100). We evaluate the Eigen-PEP representation both for video-based face verification and identification on the YouTube Faces Dataset and a new Celebrity-1000 video face dataset, respectively. On YouTube Faces, we further improve the state-of-the-art recognition accuracy. On Celebrity-1000, we lead the competing baselines by a significant margin while offering a scalable solution that is linear with respect to the number of subjects.

Journal ArticleDOI
TL;DR: A comprehensive survey on low-resolution face recognition methods, including concept description, system architecture, and method categorization is given and promising trends and crucial issues for future research are discussed.
Abstract: Low-resolution face recognition (LR FR) aims to recognize faces from small size or poor quality images with varying pose, illumination, expression, etc. It has received much attention with increasing demands for long distance surveillance applications, and extensive efforts have been made on LR FR research in recent years. However, many issues in LR FR are still unsolved, such as super-resolution (SR) for face recognition, resolution-robust features, unified feature spaces, and face detection at a distance, although many methods have been developed for that. This paper provides a comprehensive survey on these methods and discusses many related issues. First, it gives an overview on LR FR, including concept description, system architecture, and method categorization. Second, many representative methods are broadly reviewed and discussed. They are classified into two different categories, super-resolution for LR FR and resolution-robust feature representation for LR FR. Their strategies and advantages/disadvantages are elaborated. Some relevant issues such as databases and evaluations for LR FR are also presented. By generalizing their performances and limitations, promising trends and crucial issues for future research are finally discussed.

Journal ArticleDOI
TL;DR: The analysis shows that for matching frontal faces in still images, algorithms are consistently superior to humans, and for video and difficult still face pairs, humans are superior.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work constructs an efficient boosted exemplar-based face detector which overcomes the defect of the previous work by being faster, more memory efficient, and more accurate.
Abstract: Despite the fact that face detection has been studied intensively over the past several decades, the problem is still not completely solved. Challenging conditions, such as extreme pose, lighting, and occlusion, have historically hampered traditional, model-based methods. In contrast, exemplar-based face detection has been shown to be effective, even under these challenging conditions, primarily because a large exemplar database is leveraged to cover all possible visual variations. However, relying heavily on a large exemplar database to deal with the face appearance variations makes the detector impractical due to the high space and time complexity. We construct an efficient boosted exemplar-based face detector which overcomes the defect of the previous work by being faster, more memory efficient, and more accurate. In our method, exemplars as weak detectors are discriminatively trained and selectively assembled in the boosting framework which largely reduces the number of required exemplars. Notably, we propose to include non-face images as negative exemplars to actively suppress false detections to further improve the detection accuracy. We verify our approach over two public face detection benchmarks and one personal photo album, and achieve significant improvement over the state-of-the-art algorithms in terms of both accuracy and efficiency.

Posted Content
TL;DR: This paper describes the common methods like holistic matching method, feature extraction method and hybrid methods used in face recognition, and describes the future research directions of face recognition.
Abstract: Face recognition presents a challenging problem in the field of image analysis and computer vision. The security of information is becoming very significant and difficult. Security cameras are presently common in airports, Offices, University, ATM, Bank and in any locations with a security system. Face recognition is a biometric system used to identify or verify a person from a digital image. Face Recognition system is used in security. Face recognition system should be able to automatically detect a face in an image. This involves extracts its features and then recognize it, regardless of lighting, expression, illumination, ageing, transformations (translate, rotate and scale image) and pose, which is a difficult task. This paper contains three sections. The first section describes the common methods like holistic matching method, feature extraction method and hybrid methods. The second section describes applications with examples and finally third section describes the future research directions of face recognition.

Journal ArticleDOI
01 May 2014
TL;DR: This paper presents a non-intrusive fatigue detection system based on the video analysis of drivers that relies on multiple visual cues to characterize the level of alertness of the driver and yielded an average accuracy of 100% on all the videos on which it was tested.
Abstract: A non-intrusive fatigue detection system based on the video analysis of drivers.Eye closure duration measured through eye state information and yawning analyzed through mouth state information.Lips are searched through spatial fuzzy c-means (s-FCM) clustering.Pupils are also detected in the upper part of the face window on the basis of radii, inter-pupil distance and angle.The monitored information of eyes and mouth are further passed to Fuzzy Expert System (FES) that classifies the true state of the driver. This paper presents a non-intrusive fatigue detection system based on the video analysis of drivers. The system relies on multiple visual cues to characterize the level of alertness of the driver. The parameters used for detecting fatigue are: eye closure duration measured through eye state information and yawning analyzed through mouth state information. Initially, the face is located through Viola-Jones face detection method to ensure the presence of driver in video frame. Then, a mouth window is extracted from the face region, in which lips are searched through spatial fuzzy c-means (s-FCM) clustering. Simultaneously, the pupils are also detected in the upper part of the face window on the basis of radii, inter-pupil distance and angle. The monitored information of eyes and mouth are further passed to Fuzzy Expert System (FES) that classifies the true state of the driver. The system has been tested using real data, with different sequences recorded in day and night driving conditions, and with users belonging to different race and gender. The system yielded an average accuracy of 100% on all the videos on which it was tested.

Posted Content
TL;DR: This paper proposes a new deep learning framework that can recover the canonical view of face images, which dramatically reduces the intra-person variances, while maintaining the inter-person discriminativeness.
Abstract: Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications. This paper addresses this challenge by proposing a new deep learning framework that can recover the canonical view of face images. It dramatically reduces the intra-person variances, while maintaining the inter-person discriminativeness. Unlike the existing face reconstruction methods that were either evaluated in controlled 2D environment or employed 3D information, our approach directly learns the transformation from the face images with a complex set of variations to their canonical views. At the training stage, to avoid the costly process of labeling canonical-view images from the training set by hand, we have devised a new measurement to automatically select or synthesize a canonical-view image for each identity. As an application, this face recovery approach is used for face verification. Facial features are learned from the recovered canonical-view face images by using a facial component-based convolutional neural network. Our approach achieves the state-of-the-art performance on the LFW dataset.

Patent
03 Nov 2014
TL;DR: In this paper, a 3D-aligned face image can be generated from a 2D face image, which can then be used to align face images, classify face images and verify face images using a deep neural network.
Abstract: Systems, methods, and non-transitory computer readable media can align face images, classify face images, and verify face images by employing a deep neural network (DNN). A 3D-aligned face image can be generated from a 2D face image. An identity of the 2D face image can be classified based on provision of the 3D-aligned face image to the DNN. The identity of the 2D face image can comprise a feature vector.

Journal ArticleDOI
TL;DR: In this paper, a multivariate pattern analysis conducted across all EEG channels revealed that face category could be readout very early, under 100 ms poststimulus onset, and decoding accuracy did not increase monotonically; they report an increase during an initial 95-140 ms period followed by a plateau ∼140-185 ms.
Abstract: Previous magnetoencephalography/electroencephalography (M/EEG) studies have suggested that face processing is extremely rapid, indeed faster than any other object category. Most studies, however, have been performed using centered, cropped stimuli presented on a blank background resulting in artificially low interstimulus variability. In contrast, the aim of the present study was to assess the underlying temporal dynamics of face detection presented in complex natural scenes. We recorded EEG activity while participants performed a rapid go/no-go categorization task in which they had to detect the presence of a human face. Subjects performed at ceiling (94.8% accuracy), and traditional event-related potential analyses revealed only modest modulations of the two main components classically associated with face processing (P100 and N170). A multivariate pattern analysis conducted across all EEG channels revealed that face category could, however, be readout very early, under 100 ms poststimulus onset. Decoding was linked to reaction time as early as 125 ms. Decoding accuracy did not increase monotonically; we report an increase during an initial 95-140 ms period followed by a plateau ∼140-185 ms-perhaps reflecting a transitory stabilization of the face information available-and a strong increase afterward. Further analyses conducted on individual images confirmed these phases, further suggesting that decoding accuracy may be initially driven by low-level stimulus properties. Such latencies appear to be surprisingly short given the complexity of the natural scenes and the large intraclass variability of the face stimuli used, suggesting that the visual system is highly optimized for the processing of natural scenes.

Patent
12 Nov 2014
TL;DR: In this paper, the authors present methods and systems that detect at least one face in at least 1 digital image, determine and store area co-ordinates of a location of the detected face in the at least 2D image, apply at least transformation to the detected faces to create a new face, and rotate the rotated face at least until the new face is shown in a vertical orientation and a pair of eyes of the face shown in the original face are positioned on a horizontal plane.
Abstract: The present invention provides, in at least one aspect, methods and systems that detect at least one face in at least one digital image, determine and store area co-ordinates of a location of the at least one detected face in the at least one digital image, apply at least one transformation to the at least one detected face to create at least one portrait of the at least one detected face, rotate the at least one portrait at least until the at least one portrait is shown in a vertical orientation and a pair of eyes of the at least one face shown in the at least one portrait are positioned on a horizontal plane; and store the rotated at least one portrait.