scispace - formally typeset
Search or ask a question
Author

Honggu Liu

Bio: Honggu Liu is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Computer science & Leverage (statistics). The author has an hindex of 2, co-authored 4 publications receiving 8 citations.

Papers
More filters
Proceedings ArticleDOI
02 Mar 2021
TL;DR: Wang et al. as mentioned in this paper proposed a spatial-phase shallow learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery to improve the transferability.
Abstract: The remarkable success in face forgery techniques has received considerable attention in computer vision due to security concerns. We observe that up-sampling is a necessary step of most face forgery techniques, and cumulative up-sampling will result in obvious changes in the frequency domain, especially in the phase spectrum. According to the property of natural images, the phase spectrum preserves abundant frequency components that provide extra information and complement the loss of the amplitude spectrum. To this end, we present a novel Spatial-Phase Shallow Learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery to improve the transferability, for face forgery detection. And we also theoretically analyze the validity of utilizing the phase spectrum. Moreover, we notice that local texture information is more crucial than high-level semantic information for the face forgery detection task. So we reduce the receptive fields by shallowing the network to suppress high-level features and focus on the local region. Extensive experiments show that SPSL can achieve the state-of-the-art performance on cross-datasets evaluation as well as multi-class classification and obtain comparable results on single dataset evaluation.

183 citations

Journal ArticleDOI
TL;DR: A screen-to-camera image code dubbed “TERA” (transparency, efficiency, robustness and adaptability), which makes it possible to circumvent the contradiction among the above four properties for the first time.
Abstract: With the rapid development of digital devices, how to transmit information among different devices with multimedia carrier has drawn much attention from the research community. This paper focuses on the important user scenario ‘`screen-to-camera information transmission’'. Along this direction, image coding based techniques have been shown to be the most popular and effective way in the past decades. However after careful study, we find none of existing methods can satisfy the four important properties simultaneously, i.e., high transparency, high embedding efficiency, strong transmission robustness and high adaptability to device types. It is mainly because these properties are contradictory with each other. So in this paper, we propose an screen-to-camera image code dubbed ‘`TERA’' with Transparency, Efficiency, Robustness and Adaptability, which makes it possible to circumvent the contradiction among the above four properties for the first time. Generally, it adopts the color decomposition principle to ensure the visual quality and the superposition-based scheme to ensure embedding efficiency. And BCH-coding-based information arrangement and a powerful attention-guided information decoding network are further designed to guarantee the robustness and adaptability. Through extensive experiments, the superiority and broad applications of our method are demonstrated.

15 citations

Proceedings ArticleDOI
23 May 2022
TL;DR: This paper proposes a novel transformer-based framework to model both global and local information and analyze anomalies of face images, and designs attention leading module, multi-forensics module and variant residual connections for deepfake detection, and leverages token-level contrast loss for more detailed supervision.
Abstract: Recently almost all the mainstream deepfake detection methods use Convolutional Neural Networks (CNN) as their backbone. However, due to the overreliance on local texture information which is usually determined by forgery methods of training data, these CNN-based methods cannot generalize well to unseen data. To get out of the predicament of prior methods, in this paper, we propose a novel transformer-based framework to model both global and local information and analyze anomalies of face images. In particular, we design attention leading module, multi-forensics module and variant residual connections for deepfake detection, and leverage token-level contrast loss for more detailed supervision. Experiments on almost all popular public deepfake datasets demonstrate that our method achieves state-of-the-art performance in cross-dataset evaluation and comparable performance in intra-dataset evaluation.

3 citations

Posted Content
TL;DR: Wang et al. as mentioned in this paper proposed a spatial-phase shallow learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery to improve the transferability.
Abstract: The remarkable success in face forgery techniques has received considerable attention in computer vision due to security concerns. We observe that up-sampling is a necessary step of most face forgery techniques, and cumulative up-sampling will result in obvious changes in the frequency domain, especially in the phase spectrum. According to the property of natural images, the phase spectrum preserves abundant frequency components that provide extra information and complement the loss of the amplitude spectrum. To this end, we present a novel Spatial-Phase Shallow Learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery to improve the transferability, for face forgery detection. And we also theoretically analyze the validity of utilizing the phase spectrum. Moreover, we notice that local texture information is more crucial than high-level semantic information for the face forgery detection task. So we reduce the receptive fields by shallowing the network to suppress high-level features and focus on the local region. Extensive experiments show that SPSL can achieve the state-of-the-art performance on cross-datasets evaluation as well as multi-class classification and obtain comparable results on single dataset evaluation.

2 citations

Journal ArticleDOI
TL;DR: NICe as mentioned in this paper learns identity transformation from an arbitrary face-swapping proxy via a U-Net, which can filter out such outliers and well maintain the target content by uncertainty prediction.
Abstract: Deepfake aims to swap a face of an image with someone else’s likeness in a reasonable manner. Existing methods usually perform deepfake frame by frame, thus ignoring video consistency and producing incoherent results. To address such a problem, we propose a novel framework Neural Identity Carrier (NICe), which learns identity transformation from an arbitrary face-swapping proxy via a U-Net. By modeling the incoherence between frames as noise, NICe naturally suppresses its disturbance and preserves primary identity information. Concretely, NICe inputs the original frame and learns transformation supervised by swapped pseudo labels. As the temporal incoherence has an uncertain or stochastic pattern, NICe can filter out such outliers and well maintain the target content by uncertainty prediction. With the predicted temporally stable appearance, NICe enhances its details by constraining 3D geometry consistency, making NICe learn fine-grained facial structure across the poses. In this way, NICe guarantees the temporal stableness of deepfake approaches and predicts detailed results against over-smoothness. Extensive experiments on benchmarks demonstrate that NICe significantly improves the quality of existing deepfake methods on video-level. Besides, data generated by our methods can benefit video-level deepfake detection methods.

1 citations


Cited by
More filters
Proceedings ArticleDOI
18 Apr 2022
TL;DR: Novel synthetic training data called self-blended images (SBIs) to detect deepfakes are presented and extensive experiments show that the method improves the model generalization to unknown manipulations and scenes.
Abstract: In this paper, we present novel synthetic training data called self-blended images (SBIs) to detect deepfakes. SBIs are generated by blending pseudo source and target images from single pristine images, reproducing common forgery artifacts (e.g., blending boundaries and statistical inconsistencies between source and target images). The key idea behind SBIs is that more general and hardly recognizable fake samples encourage classifiers to learn generic and robust representations without overfitting to manipulation-specific artifacts. We compare our approach with state-of-the-art methods on FF++, CDF, DFD, DFDC, DFDCP, and FFIW datasets by following the standard cross-dataset and cross-manipulation protocols. Extensive experiments show that our method improves the model generalization to unknown manipulations and scenes. In particular, on DFDC and DFDCP where existing methods suffer from the domain gap between the training and test sets, our approach outperforms the baseline by 4.90% and 11.78% points in the cross-dataset evaluation, respectively. Code is available at https://github.com/mapooon/SelfBlendedImages.

37 citations

Proceedings ArticleDOI
23 Mar 2022
TL;DR: This work addresses the generalizable deepfake detection from a simple principle: a generalizable representation should be sensitive to diverse types of forgeries and synthesize augmented forgeries with a pool of forgery configurations and strengthen the “sensitivity” to the forgeries by enforcing the model to predict the forgery configuration.
Abstract: Recent studies in deepfake detection have yielded promising results when the training and testing face forgeries are from the same dataset. However, the problem remains challenging when one tries to generalize the detector to forgeries created by unseen methods in the training dataset. This work addresses the generalizable deepfake detection from a simple principle: a generalizable representation should be sensitive to diverse types of forgeries. Following this principle, we propose to enrich the “diversity” of forgeries by synthesizing augmented forgeries with a pool of forgery configurations and strengthen the “sensitivity” to the forgeries by enforcing the model to predict the forgery configurations. To effectively explore the large forgery augmentation space, we further propose to use the adversarial training strategy to dynamically synthesize the most challenging forgeries to the current model. Through extensive experiments, we show that the proposed strategies are surprisingly effective (see Figure 1), and they could achieve superior performance than the current state-of-the-art methods. Code is available at https://github.com/liangchen527/SLADD.

28 citations

Proceedings ArticleDOI
01 Jun 2022
TL;DR: This paper proposes a forgery detection frame-work emphasizing the common compact representations of genuine faces based on reconstruction-classification learning, and builds bipartite graphs over the encoder and decoder features in a multi-scale fashion.
Abstract: Existing face forgery detectors mainly focus on specific forgery patterns like noise characteristics, local textures, or frequency statistics for forgery detection. This causes specialization of learned representations to known forgery patterns presented in the training set, and makes it difficult to detect forgeries with unknown patterns. In this paper, from a new perspective, we propose a forgery detection frame-work emphasizing the common compact representations of genuine faces based on reconstruction-classification learning. Reconstruction learning over real images enhances the learned representations to be aware of forgery patterns that are even unknown, while classification learning takes the charge of mining the essential discrepancy between real and fake images, facilitating the understanding of forgeries. To achieve better representations, instead of only using the encoder in reconstruction learning, we build bipartite graphs over the encoder and decoder features in a multi-scale fashion. We further exploit the reconstruction difference as guidance of forgery traces on the graph output as the final representation, which is fed into the classifier for forgery detection. The reconstruction and classification learning is optimized end-to-end. Extensive experiments on large-scale benchmark datasets demonstrate the superiority of the proposed method over state of the arts.

23 citations

Proceedings ArticleDOI
18 Jan 2022
TL;DR: This paper harnesses the natural correspondence between the visual and auditory modalities in real videos to learn temporally dense video representations that capture factors such as facial movements, expression, and identity, and suggests that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.
Abstract: One of the most pressing challenges for the detection of face-manipulated videos is generalising to forgery methods not seen during training while remaining effective under common corruptions such as compression. In this paper, we examine whether we can tackle this issue by harnessing videos of real talking faces, which contain rich information on natural facial appearance and behaviour and are readily available in large quantities online. Our method, termed RealForensics, consists of two stages. First, we exploit the natural correspondence between the visual and auditory modalities in real videos to learn, in a self-supervised cross-modal manner, temporally dense video representations that capture factors such as facial movements, expression, and identity. Second, we use these learned representations as targets to be predicted by our forgery detector along with the usual binary forgery classification task; this encourages it to base its real/fake decision on said factors. We show that our method achieves state-of-the-art performance on cross-manipulation generalisation and robustness experiments, and examine the factors that contribute to its per-formance. Our results suggest that leveraging natural and unlabelled videos is a promising direction for the development of more robust face forgery detectors.

20 citations

Journal ArticleDOI
TL;DR: A frame inference-based detection framework (FInfer) to solve the problem of high-visual-quality Deepfake detection by first learning the referenced representations of the current and future frames’ faces and utilizing an autoregressive model.
Abstract: Deepfake has ignited hot research interests in both academia and industry due to its potential security threats. Many countermeasures have been proposed to mitigate such risks. Current Deepfake detection methods achieve superior performances in dealing with low-visual-quality Deepfake media which can be distinguished by the obvious visual artifacts. However, with the development of deep generative models, the realism of Deepfake media has been significantly improved and becomes tough challenging to current detection models. In this paper, we propose a frame inference-based detection framework (FInfer) to solve the problem of high-visual-quality Deepfake detection. Specifically, we first learn the referenced representations of the current and future frames’ faces. Then, the current frames’ facial representations are utilized to predict the future frames’ facial representations by using an autoregressive model. Finally, a representation-prediction loss is devised to maximize the discriminability of real videos and fake videos. We demonstrate the effectiveness of our FInfer framework through information theory analyses. The entropy and mutual information analyses indicate the correlation between the predicted representations and referenced representations in real videos is higher than that of high-visual-quality Deepfake videos. Extensive experiments demonstrate the performance of our method is promising in terms of in-dataset detection performance, detection efficiency, and cross-dataset detection performance in high-visual-quality Deepfake videos.

20 citations