scispace - formally typeset
Search or ask a question
Posted Content

Generalized Facial Manipulation Detection with Edge Region Feature Extraction.

TL;DR: Zhang et al. as mentioned in this paper proposed a facial forensic framework that utilizes pixel-level color features appearing in the edge region of the whole image, which includes a 3D-CNN classification model that interprets the extracted color features spatially and temporally.
Abstract: This paper presents a generalized and robust face manipulation detection method based on the edge region features appearing in images. Most contemporary face synthesis processes include color awkwardness reduction but damage the natural fingerprint in the edge region. In addition, these color correction processes do not proceed in the non-face background region. We also observe that the synthesis process does not consider the natural properties of the image appearing in the time domain. Considering these observations, we propose a facial forensic framework that utilizes pixel-level color features appearing in the edge region of the whole image. Furthermore, our framework includes a 3D-CNN classification model that interprets the extracted color features spatially and temporally. Unlike other existing studies, we conduct authenticity determination by considering all features extracted from multiple frames within one video. Through extensive experiments, including real-world scenarios to evaluate generalized detection ability, we show that our framework outperforms state-of-the-art facial manipulation detection technologies in terms of accuracy and robustness.
Citations
More filters
Proceedings ArticleDOI
01 Jun 2022
TL;DR: In this paper , a key point-based activity recognition framework is presented, which extracts complex static and movement-based features from key frames in videos, which are used to predict a sequence of key-frame activities.
Abstract: We present a key point-based activity recognition framework, built upon pre-trained human pose estimation and facial feature detection models. Our method extracts complex static and movement-based features from key frames in videos, which are used to predict a sequence of key-frame activities. Finally, a merge procedure is employed to identify robust activity segments while ignoring outlier frame activity predictions. We analyze the different components of our framework via a wide array of experiments and draw conclusions with regards to the utility of the model and ways it can be improved. Results show our model is competitive, taking the 11th place out of 27 teams submitting to Track 3 of the 2022 AI City Challenge.

1 citations

Journal ArticleDOI
TL;DR: A novel triple complementary streams detector, namely TCSD is proposed, designed to perceive depth information (DI) which is not utilized by previous methods, and two attention-based feature fusion modules are proposed to adaptively fuse information.
Abstract: Advancements in computer vision and deep learning have made it difficult to distinguish generated Deepfake media in visual. While existing detection frameworks have achieved significant performance on the challenging Deepfake datasets, these approaches consider a single perspective. More importantly, in urban scenes, neither complex scenarios can be covered by a single view, nor the correlation between multiple information is well utilized. In this paper, to mine the new view for Deepfake detection and utilize the correlation of multi-view information contained in images, we propose a novel triple complementary streams detector, namely TCSD. Specifically, firstly, a novel depth estimator is designed to perceive depth information (DI) which is not utilized by previous methods. Then, to supplement the depth information for obtaining comprehensive forgery clues, we consider the incoherence between image foreground and background information (FBI) and the inconsistency between local and global information (LGI). In addition, attention-based multi-scale feature extraction (MsFE) module is designed to perceive more complementary features from DI, FBI and LGI. Finally, two attention-based feature fusion modules are proposed to adaptively fuse information. Extensive experiment results show that the proposed approach achieves the state-of-the-art performance on detecting Deepfake.

1 citations

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed using a threshold classifier based on similarity scores obtained from a Deep Convolutional Neural Network (DCNN) trained for facial recognition, which compute a set of similarity scores between faces extracted from questioned videos and reference materials of the person depicted.
Proceedings ArticleDOI
01 Jan 2023
TL;DR: Wang et al. as discussed by the authors proposed the Temporal Identity Inconsistency Network (TI 2 Net), a deepfake detector that focuses on temporal identity inconsistency.
Abstract: In this paper, we propose the Temporal Identity Inconsistency Network (TI 2 Net), a Deepfake detector that focuses on temporal identity inconsistency. Specifically, TI 2 Net recognizes fake videos by capturing the dissimilarities of human faces among video frames of the same identity. Therefore, TI 2 Net is a reference-agnostic detector and can be used on unseen datasets. For a video clip of a given identity, identity information in all frames will first be encoded to identity vectors. TI 2 Net learns the temporal identity embedding from the temporal difference of the identity vectors. The temporal embedding, representing the identity inconsistency in the video clip, is finally used to determine the authenticity of the video clip. During training, TI 2 Net incorporates triplet loss to learn more discriminative temporal embeddings. We conduct comprehensive experiments to evaluate the performance of the proposed TI 2 Net. Experimental results indicate that TI 2 Net generalizes well to unseen manipulations and datasets with unseen identities. Besides, TI 2 Net also shows robust performance against compression and additive noise.
References
More filters
Journal ArticleDOI
08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

38,211 citations

Journal ArticleDOI
TL;DR: There is a natural uncertainty principle between detection and localization performance, which are the two main goals, and with this principle a single operator shape is derived which is optimal at any scale.
Abstract: This paper describes a computational approach to edge detection. The success of the approach depends on the definition of a comprehensive set of goals for the computation of edge points. These goals must be precise enough to delimit the desired behavior of the detector while making minimal assumptions about the form of the solution. We define detection and localization criteria for a class of edges, and present mathematical forms for these criteria as functionals on the operator impulse response. A third criterion is then added to ensure that the detector has only one response to a single edge. We use the criteria in numerical optimization to derive detectors for several common image features, including step edges. On specializing the analysis to step edges, we find that there is a natural uncertainty principle between detection and localization performance, which are the two main goals. With this principle we derive a single operator shape which is optimal at any scale. The optimal detector has a simple approximate implementation in which edges are marked at maxima in gradient magnitude of a Gaussian-smoothed image. We extend this simple detector using operators of several widths to cope with different signal-to-noise ratios in the image. We present a general method, called feature synthesis, for the fine-to-coarse integration of information from operators at different scales. Finally we show that step edge detector performance improves considerably as the operator point spread function is extended along the edge.

28,073 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations

Posted Content
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

23,486 citations

Proceedings ArticleDOI
François Chollet1
21 Jul 2017
TL;DR: This work proposes a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions, and shows that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset, and significantly outperforms it on a larger image classification dataset.
Abstract: We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.

10,422 citations

Related Papers (1)