scispace - formally typeset
Search or ask a question
Author

Pavel Korshunov

Bio: Pavel Korshunov is an academic researcher from Idiap Research Institute. The author has contributed to research in topics: Crowdsourcing & JPEG. The author has an hindex of 30, co-authored 84 publications receiving 2009 citations. Previous affiliations of Pavel Korshunov include École Polytechnique Fédérale de Lausanne & École Normale Supérieure.


Papers
More filters
Posted Content
TL;DR: This paper presents the first publicly available set of Deepfake videos generated from videos of VidTIMIT database, and demonstrates that GAN-generated Deep fake videos are challenging for both face recognition systems and existing detection methods.
Abstract: It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found that audio-visual approach based on lip-sync inconsistency detection was not able to distinguish Deepfake videos. The best performing method, which is based on visual quality metrics and is often used in presentation attack detection domain, resulted in 8.97% equal error rate on high quality Deepfakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.

369 citations

Proceedings ArticleDOI
01 May 2020
TL;DR: This work introduces pyannote.audio, an open-source toolkit written in Python for speaker diarization, which provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker darization pipelines.
Abstract: We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding – reaching state-of-the-art performance for most of them.

179 citations

Proceedings ArticleDOI
01 Jul 2013
TL;DR: This paper proposes an algorithm based on well-known warping techniques (common for animation and artistic purposes) to obfuscate faces in video surveillance, aiming to overcome shortcomings in tools for protection of visual privacy.
Abstract: The widespread use of digital video surveillance systems has also increased the concerns for violation of privacy rights. Since video surveillance systems are invasive, it is a challenge to find an acceptable balance between privacy of the public under surveillance and the functionalities of the systems. Tools for protection of visual privacy available today lack either all or some of the important properties such as security of protected visual data, reversibility (ability to undo privacy protection), simplicity, and independence from the video encoding used. In this paper, we propose an algorithm based on well-known warping techniques (common for animation and artistic purposes) to obfuscate faces in video surveillance, aiming to overcome these shortcomings. To demonstrate the feasibility of such an approach, we apply warping algorithm to faces in a standard Yale dataset and run face detection and recognition algorithms on the resulted images. Experiments demonstrate the tradeoff between warping strength and accuracy for both detection and recognition.

99 citations

Proceedings ArticleDOI
21 Oct 2013
TL;DR: This paper proposes a morphing-based privacy protection method and focuses on its robustness, reversibility, and security properties and demonstrates that morphed faces retain the likeness of a face, while making them unrecognizable, which ensures the protection of privacy.
Abstract: The widespread use of digital video surveillance systems has also increased the concerns for violation of privacy rights. Since video surveillance systems are invasive, it is a challenge to find an acceptable balance between privacy of the public under surveillance and the functionalities of the systems. Tools for protection of visual privacy available today lack either all or some of the important properties such as security of protected visual data, reversibility (ability to undo privacy protection), simplicity, and independence from the video encoding used. To overcome these shortcomings, in this paper, we propose a morphing-based privacy protection method and focus on its robustness, reversibility, and security properties. We morph faces from a standard FERET dataset and run face detection and recognition algorithms on the resulted images to demonstrate that morphed faces retain the likeness of a face, while making them unrecognizable, which ensures the protection of privacy. Our experiments also demonstrate the influence of morphing strength on robustness and security. We also show how to determine the right parameters of the method.

95 citations

Proceedings ArticleDOI
04 Jun 2019
TL;DR: This paper presents the first publicly available set of Deepfake videos generated from videos of VidTIMIT database, and demonstrates that GAN-generated Deep fake videos are challenging for both face recognition systems and existing detection methods.
Abstract: It is becoming increasingly easy to automatically replace a face of one person in a video with the face of another person by using a pre-trained generative adversarial network (GAN). Recent public scandals, e.g., the faces of celebrities being swapped onto pornographic videos, call for automated ways to detect these Deepfake videos. To help developing such methods, in this paper, we present the first publicly available set of Deepfake videos generated from videos of VidTIMIT database. We used open source software based on GANs to create the Deepfakes, and we emphasize that training and blending parameters can significantly impact the quality of the resulted videos. To demonstrate this impact, we generated videos with low and high visual quality (320 videos each) using differently tuned parameter sets. We showed that the state of the art face recognition systems based on VGG and Facenet neural networks are vulnerable to Deepfake videos, with 85.62% and 95.00% false acceptance rates (on high quality versions) respectively, which means methods for detecting Deepfake videos are necessary. By considering several baseline approaches, we found the best performing method based on visual quality metrics, which is often used in presentation attack detection domain, to lead to 8.97% equal error rate on high quality Deep-fakes. Our experiments demonstrate that GAN-generated Deepfake videos are challenging for both face recognition systems and existing detection methods, and the further development of face swapping technology will make it even more so.

86 citations


Cited by
More filters
Book ChapterDOI
08 Oct 2016
TL;DR: In this paper, the authors combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image style transfer, where a feedforward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
Abstract: We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

6,639 citations

Posted Content
TL;DR: This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
Abstract: We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a \emph{per-pixel} loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing \emph{perceptual} loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

5,668 citations

Journal ArticleDOI
TL;DR: The authors found that people are much more likely to believe stories that favor their preferred candidate, especially if they have ideologically segregated social media networks, and that the average American adult saw on the order of one or perhaps several fake news stories in the months around the 2016 U.S. presidential election, with just over half of those who recalled seeing them believing them.
Abstract: Following the 2016 U.S. presidential election, many have expressed concern about the effects of false stories (“fake news”), circulated largely through social media. We discuss the economics of fake news and present new data on its consumption prior to the election. Drawing on web browsing data, archives of fact-checking websites, and results from a new online survey, we find: (i) social media was an important but not dominant source of election news, with 14 percent of Americans calling social media their “most important” source; (ii) of the known false news stories that appeared in the three months before the election, those favoring Trump were shared a total of 30 million times on Facebook, while those favoring Clinton were shared 8 million times; (iii) the average American adult saw on the order of one or perhaps several fake news stories in the months around the election, with just over half of those who recalled seeing them believing them; and (iv) people are much more likely to believe stories that favor their preferred candidate, especially if they have ideologically segregated social media networks.

3,959 citations

Proceedings ArticleDOI
25 Jan 2019
TL;DR: In this paper, the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans, is examined.
Abstract: The rapid progress in synthetic image generation and manipulation has now come to a point where it raises significant concerns for the implications towards society. At best, this leads to a loss of trust in digital content, but could potentially cause further harm by spreading false information or fake news. This paper examines the realism of state-of-the-art image manipulations, and how difficult it is to detect them, either automatically or by humans. To standardize the evaluation of detection methods, we propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on Deep-Fakes, Face2Face, FaceSwap and NeuralTextures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. This dataset is over an order of magnitude larger than comparable, publicly available, forgery datasets. Based on this data, we performed a thorough analysis of data-driven forgery detectors. We show that the use of additional domain-specific knowledge improves forgery detection to unprecedented accuracy, even in the presence of strong compression, and clearly outperforms human observers.

917 citations

Proceedings ArticleDOI
31 Oct 2019
TL;DR: This work proposes an efficient algorithm to embed a given image into the latent space of StyleGAN, which enables semantic image editing operations that can be applied to existing photographs.
Abstract: We propose an efficient algorithm to embed a given image into the latent space of StyleGAN. This embedding enables semantic image editing operations that can be applied to existing photographs. Taking the StyleGAN trained on the FFHD dataset as an example, we show results for image morphing, style transfer, and expression transfer. Studying the results of the embedding algorithm provides valuable insights into the structure of the StyleGAN latent space. We propose a set of experiments to test what class of images can be embedded, how they are embedded, what latent space is suitable for embedding, and if the embedding is semantically meaningful.

851 citations