scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Disguised Faces in the Wild

TL;DR: A novel Disguised Faces in the Wild (DFW) dataset, consisting of over 11,000 images for understanding and pushing the current state-of-the-art for disguised face recognition, along with the phase-I results of the CVPR2018 competition.
Abstract: Existing research in the field of face recognition with variations due to disguises focuses primarily on images captured in controlled settings. Limited research has been performed on images captured in unconstrained environments, primarily due to the lack of corresponding disguised face datasets. In order to overcome this limitation, this work presents a novel Disguised Faces in the Wild (DFW) dataset, consisting of over 11,000 images for understanding and pushing the current state-of-the-art for disguised face recognition. To the best of our knowledge, DFW is a first-of-a-kind dataset containing images pertaining to both obfuscation and impersonation for understanding the effect of disguise variations. A major portion of the dataset has been collected from the Internet, thereby encompassing a wide variety of disguise accessories and variations across other covariates. As part of CVPR2018, a competition and workshop are organized to facilitate research in this direction. This paper presents a description of the dataset, the baseline protocols and performance, along with the phase-I results of the competition.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive review of the recent developments on deep face recognition can be found in this paper, covering broad topics on algorithm designs, databases, protocols, and application scenes, as well as the technical challenges and several promising directions.

353 citations

Journal ArticleDOI
TL;DR: It is shown that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content.
Abstract: The recent proliferation of fake portrait videos poses direct threats on society, law, and privacy [1]. Believing the fake video of a politician, distributing fake pornographic content of celebrities, fabricating impersonated fake videos as evidence in courts are just a few real world consequences of deep fakes. We present a novel approach to detect synthetic content in portrait videos, as a preventive solution for the emerging threat of deep fakes. In other words, we introduce a deep fake detector. We observe that detectors blindly utilizing deep learning are not effective in catching fake content, as generative models produce formidably realistic results. Our key assertion follows that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content. To prove and exploit this assertion, we first engage several signal transformations for the pairwise separation problem, achieving 99.39% accuracy. Second, we utilize those findings to formulate a generalized classifier for fake content, by analyzing proposed signal transformations and corresponding feature sets. Third, we generate novel signal maps and employ a CNN to improve our traditional classifier for detecting synthetic content. Lastly, we release an "in the wild" dataset of fake portrait videos that we collected as a part of our evaluation process. We evaluate FakeCatcher on several datasets, resulting with 96%, 94.65%, 91.50%, and 91.07% accuracies, on Face Forensics [2], Face Forensics++ [3], CelebDF [4], and on our new Deep Fakes Dataset respectively. In addition, our approach produces a significantly superior detection rate against baselines, and does not depend on the source, generator, or properties of the fake content. We also analyze signals from various facial regions, under image distortions, with varying segment durations, from different generators, against unseen datasets, and under several dimensionality reduction techniques.

245 citations


Cites background from "Disguised Faces in the Wild"

  • ...Due to the fact that rPPG is mostly evaluated by the accuracy in heart rate, we researched other features for image authenticity [34], classification of EEG signals [28, 41], statistical analysis [60, 50, 25], and emotion recognition [39, 41]....

    [...]

  • ...However, for synthetic images in our context, the noise and distortions are harder to detect due to the non-linearity and complexity of the learning process [34]....

    [...]

Journal ArticleDOI
TL;DR: The history of face recognition technology, the current state-of-the-art methodologies, and future directions are presented, specifically on the most recent databases, 2D and 3D face recognition methods.
Abstract: Face recognition is one of the most active research fields of computer vision and pattern recognition, with many practical and commercial applications including identification, access control, forensics, and human-computer interactions. However, identifying a face in a crowd raises serious questions about individual freedoms and poses ethical issues. Significant methods, algorithms, approaches, and databases have been proposed over recent years to study constrained and unconstrained face recognition. 2D approaches reached some degree of maturity and reported very high rates of recognition. This performance is achieved in controlled environments where the acquisition parameters are controlled, such as lighting, angle of view, and distance between the camera–subject. However, if the ambient conditions (e.g., lighting) or the facial appearance (e.g., pose or facial expression) change, this performance will degrade dramatically. 3D approaches were proposed as an alternative solution to the problems mentioned above. The advantage of 3D data lies in its invariance to pose and lighting conditions, which has enhanced recognition systems efficiency. 3D data, however, is somewhat sensitive to changes in facial expressions. This review presents the history of face recognition technology, the current state-of-the-art methodologies, and future directions. We specifically concentrate on the most recent databases, 2D and 3D face recognition methods. Besides, we pay particular attention to deep learning approach as it presents the actuality in this field. Open issues are examined and potential directions for research in facial recognition are proposed in order to provide the reader with a point of reference for topics that deserve consideration.

155 citations


Cites background from "Disguised Faces in the Wild"

  • ...[42] created a novel disguised faces in the wild (DFW) dataset, consisting of 1000 subjects from 11,157 images with both obfuscated and impersonalized faces to improve the state-of-the-art for face recognition disguises....

    [...]

  • ...Database Apparition’s Date Images Subjects Images/Subject ORL [23] 1994 400 40 10 FERET [13] 1996 14,126 1199 - AR [24] 1998 3016 116 26 XM2VTS [25] 1999 - 295 - BANCA [26] 2003 - 208 - FRGC [14] 2006 50,000 - 7 LFW [10] 2007 13,233 5749 ≈2.3 CMU Multi PIE [29] 2009 >750,000 337 N/A IJB-A [31] 2015 5712 500 ≈11.4 CFP [35] 2016 7000 500 >14 DMFD [37] 2016 2460 410 6 IJB-B [40] 2017 21,798 1845 ≈36.2 MF2 [41] 2017 4.7 M 672,057 ≈7 DFW [42] 2018 11,157 1000 ≈5.26 IJB-C [43) 2018 31,334 3531 ≈6 LFR [44] 2020 30,000 542 10–260 RMFRD [45] 2020 95,000 525 - SMFRD [45] 2020 500,000 10,000 - Figure 17....

    [...]

  • ...7 M 672,057 ≈7 DFW [42] 2018 11,157 1000 ≈5....

    [...]

  • ...Database Apparition’s Date Images Subjects Images/Subject L [23] 1994 400 40 10 FERET [13] 1996 14,126 1199 - AR [24] 1998 3016 116 26 XM2VTS [25] 1999 - 295 - BANCA [26] 2003 - 208 - FRG [14] 2 6 50,000 - 7 LFW [10] 2007 13,233 5749 ≈2.3 CMU Multi PIE [29] 2009 >750,000 337 N/A IJB-A [31] 2015 5712 500 ≈11.4 CFP [35] 2016 7000 500 >14 DMFD [37] 2016 2460 410 IJB-B [40] 2 7 21,798 1845 ≈36.2 MF2 [41] 2017 4.7 M 672,057 ≈7 DFW [42] 2018 11,157 1000 ≈5.26 IJB-C [43] 2018 31,334 3531 ≈6 FR [44] 2020 30,000 542 10–260 RMFRD [45] 20 9 , 00 525 - SMFRD [45] 2 20 500,000 10,000 - Table 2....

    [...]

  • ...In 2018, Kushwaha et al. [42] created a novel disguised faces in the wild (DFW) dataset, consisting of 1000 subjects from 11,157 images with both obfuscated and impersonalized faces to improve the state-of-the-art for face recognition disguises....

    [...]

Posted Content
TL;DR: This work provides a comprehensive survey of more than 120 promising works on biometric recognition (including face, fingerprint, iris, palmprint, ear, voice, signature, and gait recognition), which deploy deep learning models, and show their strengths and potentials in different applications.
Abstract: Deep learning-based models have been very successful in achieving state-of-the-art results in many of the computer vision, speech recognition, and natural language processing tasks in the last few years. These models seem a natural fit for handling the ever-increasing scale of biometric recognition problems, from cellphone authentication to airport security systems. Deep learning-based models have increasingly been leveraged to improve the accuracy of different biometric recognition systems in recent years. In this work, we provide a comprehensive survey of more than 120 promising works on biometric recognition (including face, fingerprint, iris, palmprint, ear, voice, signature, and gait recognition), which deploy deep learning models, and show their strengths and potentials in different applications. For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics. We will then talk about several promising deep learning works developed for that biometric, and show their performance on popular public benchmarks. We will also discuss some of the main challenges while using these models for biometric recognition, and possible future directions to which research in this area is headed.

88 citations


Cites methods from "Disguised Faces in the Wild"

  • ...4M photos of 200k subjects), FaceNet (Google private dataset of more than 500M photos of more than 10M subjects), WebFaces (a dataset of 80M photos crawled from web) [93], and Disguised Faces in the Wild (DFW) [95]...

    [...]

  • ...Other Datasets: It is worth mentioning that there are several other datasets which we skipped the details due to being private or less popularity, such as DeepFace (Facebook private dataset of 4.4M photos of 4k subjects), NTechLab (a private dataset of 18.4M photos of 200k subjects), FaceNet (Google private dataset of more than 500M photos of more than 10M subjects), WebFaces (a dataset of 80M photos crawled from web) [93], and Disguised Faces in the Wild (DFW) [95] which contains over 11,000 images of 1,000 identities with variations across different types of disguise accessories....

    [...]

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This is the first attempt to train a Neural-ODE on original videos to predict the heart rate of fake videos, and it is shown that theheart rate offake videos can be used to distinguish original and fake videos.
Abstract: Deepfake is a technique used to manipulate videos using computer code. It involves replacing the face of a person in a video with the face of another person. The automation of video manipulation means that deepfakes are becoming more prevalent and easier to implement. This can be credited to the emergence of apps like FaceApp and FakeApp, which allow users to create their own deepfake videos using their smartphones. It has hence become essential to detect fake videos, to avoid the spread of false information. A recent study shows that the heart rate of fake videos can be used to distinguish original and fake videos. In the study presented, we obtained the heart rate of original videos and trained the state-of-the-art Neural Ordinary Differential Equations (Neural-ODE) model. We then created deepfake videos using commercial software. The average loss obtained for ten original videos is 0.010927, and ten donor videos are 0.010041. The trained Neural-ODE was able to predict the heart rate of our 10 deepfake videos generated using commercial software and 320 deepfake videos of deepfakeTIMI database. To best of our knowledge, this is the first attempt to train a Neural-ODE on original videos to predict the heart rate of fake videos.

71 citations


Cites background from "Disguised Faces in the Wild"

  • ...However, it is hard to find distortion, compression artifacts, and noises in synthetic images due to non-linearity [22]....

    [...]

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Journal ArticleDOI
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [1] and Fast R-CNN [2] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features—using the recently popular terminology of neural networks with ’attention’ mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model [3] , our detection system has a frame rate of 5 fps ( including all steps ) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

26,458 citations


"Disguised Faces in the Wild" refers methods in this paper

  • ...To address this issue, face coordinates obtained via Faster RCNN [17] are also provided with the dataset for both training and testing partitions....

    [...]

Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations

01 Oct 2008
TL;DR: The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.
Abstract: Most face databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variables as position, pose, lighting, background, camera quality, and gender. While there are many applications for face recognition technology in which one can control the parameters of image acquisition, there are also many applications in which the practitioner has little or no control over such parameters. This database, Labeled Faces in the Wild, is provided as an aid in studying the latter, unconstrained, recognition problem. The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life. The database exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background. In addition to describing the details of the database, we provide specific experimental paradigms for which the database is suitable. This is done in an effort to make research performed with the database as consistent and comparable as possible. We provide baseline results, including results of a state of the art face recognition system combined with a face alignment system. To facilitate experimentation on the database, we provide several parallel databases, including an aligned version.

5,742 citations


"Disguised Faces in the Wild" refers background in this paper

  • ...Recently, researchers have proposed large scale datasets captured in uncontrolled scenarios for performing face recognition [7, 8, 24]....

    [...]