Detecting GANs and Retouching Based Digital Alterations via DAD-HCNN

doi:10.1109/CVPRW50498.2020.00344

Home
/
Papers
/
Detecting GANs and Retouching Based Digital Alterations via DAD-HCNN

Proceedings Article•DOI•

Detecting GANs and Retouching Based Digital Alterations via DAD-HCNN

Anubhav Jain¹, Puspita Majumdar¹, Richa Singh², Mayank Vatsa²•Institutions (2)

Indraprastha Institute of Information Technology¹, Indian Institute of Technology, Jodhpur²

14 Jun 2020-pp 2870-2879

TL;DR: A hierarchical approach termed as DAD-HCNN which performs two-fold task: it differentiates between digitally generated images and digitally retouched images from the original unaltered images, and to increase the explainability of the decision, it also identifies the GAN architecture used to create the image.

read less

Abstract: While image generation and editing technologies such as Generative Adversarial Networks and Photoshop are being used for creative and positive applications, the misuse of these technologies to create negative applications including Deep-nude and fake news is also increasing at a rampant pace. Therefore, detecting digitally created and digitally altered images is of paramount importance. This paper proposes a hierarchical approach termed as DAD-HCNN which performs two-fold task: (i) it differentiates between digitally generated images and digitally retouched images from the original unaltered images, and (ii) to increase the explainability of the decision, it also identifies the GAN architecture used to create the image. The effectiveness of the model is demonstrated on a database generated by combining face images generated from four different GAN architectures along with the retouched images and original images from existing benchmark databases.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection

[...]

Ruben Tolosana¹, Ruben Vera-Rodriguez¹, Julian Fierrez¹, Aythami Morales¹, Javier Ortega-Garcia¹ - Show less +1 more•Institutions (1)

Autonomous University of Madrid¹

01 Jan 2020-Information Fusion

TL;DR: This survey provides a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations, with special attention to the latest generation of DeepFakes.

...read moreread less

502 citations

Cites background from "Detecting GANs and Retouching Based..."

...In fact, these fingerprints seem to be dependent not only of the GAN architecture, but also of the different instances of it [49]–[51]....
[...]

Journal Article•DOI•

MIPGAN—Generating Strong and High Quality Morphing Attacks Using Identity Prior Driven GAN

[...]

Haoyu Zhang¹, Sushma Venkatesh¹, Raghavendra Ramachandra¹, Kiran B. Raja¹, Naser Damer², Christoph Busch¹ - Show less +2 more•Institutions (2)

Norwegian University of Science and Technology¹, Fraunhofer Society²

14 Apr 2021

TL;DR: The proposed MIPGAN is derived from the StyleGAN with a newly formulated loss function exploiting perceptual quality and identity factor to generate a high quality morphed facial image with minimal artefacts and with high resolution.

...read moreread less

Abstract: Face morphing attacks target to circumvent Face Recognition Systems (FRS) by employing face images derived from multiple data subjects (e.g., accomplices and malicious actors). Morphed images can be verified against contributing data subjects with a reasonable success rate, given they have a high degree of facial resemblance. The success of morphing attacks is directly dependent on the quality of the generated morph images. We present a new approach for generating strong attacks extending our earlier framework for generating face morphs. We present a new approach using an Identity Prior Driven Generative Adversarial Network, which we refer to as MIPGAN (Morphing through Identity Prior driven GAN) . The proposed MIPGAN is derived from the StyleGAN with a newly formulated loss function exploiting perceptual quality and identity factor to generate a high quality morphed facial image with minimal artefacts and with high resolution. We demonstrate the proposed approach’s applicability to generate strong morphing attacks by evaluating its vulnerability against both commercial and deep learning based Face Recognition System (FRS) and demonstrate the success rate of attacks. Extensive experiments are carried out to assess the FRS’s vulnerability against the proposed morphed face generation technique on three types of data such as digital images, re-digitized (printed and scanned) images, and compressed images after re-digitization from newly generated MIPGAN Face Morph Dataset . The obtained results demonstrate that the proposed approach of morph generation poses a high threat to FRS.

...read moreread less

73 citations

Cites background from "Detecting GANs and Retouching Based..."

...Additionally, we investigate recent works about general face manipulation detection [54] [55] [56] and some results are shown in the supplementary material....
[...]

Journal Article•DOI•

Fighting Deepfake by Exposing the Convolutional Traces on Images

[...]

Luca Guarnera¹, Oliver Giudice¹, Sebastiano Battiato¹•Institutions (1)

University of Catania¹

09 Sep 2020-IEEE Access

TL;DR: A new approach aimed to extract a Deepfake fingerprint from images is proposed, based on the Expectation-Maximization algorithm trained to detect and extract a fingerprint that represents the Convolutional Traces left by GANs during image generation.

...read moreread less

Abstract: Advances in Artificial Intelligence and Image Processing are changing the way people interacts with digital images and video. Widespread mobile apps like FACEAPP make use of the most advanced Generative Adversarial Networks (GAN) to produce extreme transformations on human face photos such gender swap, aging, etc. The results are utterly realistic and extremely easy to be exploited even for non-experienced users. This kind of media object took the name of Deepfake and raised a new challenge in the multimedia forensics field: the Deepfake detection challenge. Indeed, discriminating a Deepfake from a real image could be a difficult task even for human eyes but recent works are trying to apply the same technology used for generating images for discriminating them with preliminary good results but with many limitations: employed Convolutional Neural Networks are not so robust, demonstrate to be specific to the context and tend to extract semantics from images. In this paper, a new approach aimed to extract a Deepfake fingerprint from images is proposed. The method is based on the Expectation-Maximization algorithm trained to detect and extract a fingerprint that represents the Convolutional Traces (CT) left by GANs during image generation. The CT demonstrates to have high discriminative power achieving better results than state-of-the-art in the Deepfake detection task also proving to be robust to different attacks. Achieving an overall classification accuracy of over 98%, considering Deepfakes from 10 different GAN architectures not only involved in images of faces, the CT demonstrates to be reliable and without any dependence on image semantic. Finally, tests carried out on Deepfakes generated by FACEAPP achieving 93% of accuracy in the fake detection task, demonstrated the effectiveness of the proposed technique on a real-case scenario.

...read moreread less

49 citations

Cites methods from "Detecting GANs and Retouching Based..."

...[21] proposed a work known as DAD-HCNN, a new framework based on a hierarchical classification pipeline composed of three levels to distinguish respectively real Vs altered images (first level), retouched Vs GAN’s generated images (second level) and finally, the specific GAN architecture (third level)....
[...]

Posted Content•

DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection

[...]

Ruben Tolosana¹, Ruben Vera-Rodriguez¹, Julian Fierrez¹, Aythami Morales¹, Javier Ortega-Garcia¹ - Show less +1 more•Institutions (1)

Autonomous University of Madrid¹

01 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors provide a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations, including entire face synthesis, identity swap (DeepFakes), attribute manipulation and expression swap.

...read moreread less

Abstract: The free access to large-scale public databases, together with the fast progress of deep learning techniques, in particular Generative Adversarial Networks, have led to the generation of very realistic fake content with its corresponding implications towards society in this era of fake news. This survey provides a thorough review of techniques for manipulating face images including DeepFake methods, and methods to detect such manipulations. In particular, four types of facial manipulation are reviewed: i) entire face synthesis, ii) identity swap (DeepFakes), iii) attribute manipulation, and iv) expression swap. For each manipulation group, we provide details regarding manipulation techniques, existing public databases, and key benchmarks for technology evaluation of fake detection methods, including a summary of results from those evaluations. Among all the aspects discussed in the survey, we pay special attention to the latest generation of DeepFakes, highlighting its improvements and challenges for fake detection. In addition to the survey information, we also discuss open issues and future trends that should be considered to advance in the field.

...read moreread less

42 citations

Posted Content•

Detection, Attribution and Localization of GAN Generated Images

[...]

Michael Goebel¹, Lakshmanan Nataraj, Tejaswi Nanjundaswamy, Tajuddin Manhar Mohammed, Shivkumar Chandrasekaran, B.S. Manjunath - Show less +2 more•Institutions (1)

University of California, Santa Barbara¹

20 Jul 2020-arXiv: Image and Video Processing

TL;DR: A novel approach to detect, attribute and localize GAN generated images that combines image features with deep learning methods is proposed.

...read moreread less

Abstract: Recent advances in Generative Adversarial Networks (GANs) have led to the creation of realistic-looking digital images that pose a major challenge to their detection by humans or computers. GANs are used in a wide range of tasks, from modifying small attributes of an image (StarGAN [14]), transferring attributes between image pairs (CycleGAN [91]), as well as generating entirely new images (ProGAN [36], StyleGAN [37], SPADE/GauGAN [64]). In this paper, we propose a novel approach to detect, attribute and localize GAN generated images that combines image features with deep learning methods. For every image, co-occurrence matrices are computed on neighborhood pixels of RGB channels in different directions (horizontal, vertical and diagonal). A deep learning network is then trained on these features to detect, attribute and localize these GAN generated/manipulated images. A large scale evaluation of our approach on 5 GAN datasets comprising over 2.76 million images (ProGAN, StarGAN, CycleGAN, StyleGAN and SPADE/GauGAN) shows promising results in detecting GAN generated images.

...read moreread less

23 citations

1
2
3
4
…
5

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Focal Loss for Dense Object Detection

[...]

Tsung-Yi Lin¹, Priya Goyal², Ross Girshick², Kaiming He², Piotr Dollár² - Show less +1 more•Institutions (2)

Cornell University¹, Facebook²

07 Aug 2017

TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.

...read moreread less

Abstract: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.

...read moreread less

12,161 citations

Posted Content•

Image-to-Image Translation with Conditional Adversarial Networks

[...]

Phillip Isola¹, Jun-Yan Zhu¹, Tinghui Zhou¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

21 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Abstract: We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.

...read moreread less

11,127 citations

Proceedings Article•

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

[...]

Christian Szegedy¹, Sergey Ioffe¹, Vincent Vanhoucke¹, Alexander A. Alemi¹•Institutions (1)

Google¹

23 Feb 2016

TL;DR: In this paper, the authors show that training with residual connections accelerates the training of Inception networks significantly, and they also present several new streamlined architectures for both residual and non-residual Inception Networks.

...read moreread less

Abstract: Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question of whether there are any benefit in combining the Inception architecture with residual connections. Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin. We also present several new streamlined architectures for both residual and non-residual Inception networks. These variations improve the single-frame recognition performance on the ILSVRC 2012 classification task significantly. We further demonstrate how proper activation scaling stabilizes the training of very wide residual Inception networks. With an ensemble of three residual and one Inception-v4, we achieve 3.08 percent top-5 error on the test set of the ImageNet classification (CLS) challenge

...read moreread less

6,761 citations

Posted Content•

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

[...]

Alec Radford, Luke Metz, Soumith Chintala¹•Institutions (1)

Facebook¹

19 Nov 2015-arXiv: Learning

TL;DR: This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.

...read moreread less

Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

...read moreread less

6,759 citations

"Detecting GANs and Retouching Based..." refers background or methods in this paper

...Along with these “handcrafted” tools, Generative Adversarial Networks (GANs) based tools are also becoming popular [14, 19, 30]....
[...]
...CMU Multi-PIE [12] and ND-IIITD datasets [5] along with the images generated using different models of GANs [8, 19, 28, 30]....
[...]
...DCGAN [30] takes a random noise as input to generate realistic images....
[...]
...The next nine columns contain images generated using StarGAN [8] by changing different attributes, (b) Original (first row) and generated (second row) images using SRGAN [19], (c) Images generated using DCGAN [30], and (d) First row contains original images with mask showing the region to be reconstructed using Context Encoders [28], second row shows the generated images and the third row contains the original images....
[...]
...The digitally altered class contains images of the retouched class and the images generated using four different models of GANs, namely, StarGAN [8], SRGAN [19], DCGAN [30], and Context Encoder [28]....
[...]

Proceedings Article•DOI•

Deep Learning Face Attributes in the Wild

[...]

Ziwei Liu¹, Ping Luo, Xiaogang Wang¹, Xiaoou Tang¹•Institutions (1)

The Chinese University of Hong Kong¹

07 Dec 2015

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.

...read moreread less

Abstract: Predicting face attributes in the wild is challenging due to complex face variations. We propose a novel deep learning framework for attribute prediction in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently. LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction. This framework not only outperforms the state-of-the-art with a large margin, but also reveals valuable facts on learning face representation. (1) It shows how the performances of face localization (LNet) and attribute prediction (ANet) can be improved by different pre-training strategies. (2) It reveals that although the filters of LNet are fine-tuned only with image-level attribute tags, their response maps over entire images have strong indication of face locations. This fact enables training LNet for face localization with only image-level annotations, but without face bounding boxes or landmarks, which are required by all attribute recognition works. (3) It also demonstrates that the high-level hidden neurons of ANet automatically discover semantic concepts after pre-training with massive face identities, and such concepts are significantly enriched after fine-tuning with attribute tags. Each attribute can be well explained with a sparse linear combination of these concepts.

...read moreread less

6,273 citations