The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

doi:10.1109/CVPR.2018.00068

Open AccessProceedings ArticleDOI

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

- pp 586-595

TLDR

In this paper, the authors introduce a new dataset of human perceptual similarity judgments, and systematically evaluate deep features across different architectures and tasks and compare them with classic metrics, finding that deep features outperform all previous metrics by large margins on their dataset.

Abstract:

While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.

Citations

PDF

Open Access

More filters

Posted Content

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Ben Mildenhall, +5 more

- 19 Mar 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work describes how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrates results that outperform prior work on neural rendering and view synthesis.

...read moreread less

Posted Content

Analyzing and Improving the Image Quality of StyleGAN

Tero Karras, +5 more

- 03 Dec 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work redesigns the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images, and thereby redefines the state of the art in unconditional image modeling.

...read moreread less

Proceedings ArticleDOI

Analyzing and Improving the Image Quality of StyleGAN

Tero Karras, +5 more

TL;DR: In this paper, the authors propose to redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images.

...read moreread less

Book ChapterDOI

Multimodal Unsupervised Image-to-Image Translation

Xun Huang, +3 more

TL;DR: In this article, the authors propose a multimodal unsupervised image-to-image (MUNIT) framework, where the image representation can be decomposed into a content code that is domain-invariant and a style code that captures domain-specific properties.

...read moreread less

Posted Content

A Style-Based Generator Architecture for Generative Adversarial Networks

Tero Karras, +2 more

- 12 Dec 2018 -

arXiv: Neural and Evolutionary Computing

TL;DR: This article proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

Zhou Wang, +3 more

- 01 Apr 2004 -

IEEE Transactions on Image Processing

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.

...read moreread less

Proceedings ArticleDOI

Image-to-Image Translation with Conditional Adversarial Networks

Phillip Isola, +3 more

TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.

...read moreread less

Journal ArticleDOI

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

Daniel Scharstein, +2 more

- 09 Dec 2001 -

International Journal of Computer Vision

TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.

...read moreread less

Collapse

Related Papers (5)

Generative Adversarial Nets

Ian Goodfellow, +7 more

Image quality assessment: from error visibility to structural similarity

Zhou Wang, +3 more

- 01 Apr 2004 -

IEEE Transactions on Image Processing

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Citations

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Analyzing and Improving the Image Quality of StyleGAN

Analyzing and Improving the Image Quality of StyleGAN

Multimodal Unsupervised Image-to-Image Translation

A Style-Based Generator Architecture for Generative Adversarial Networks

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Image quality assessment: from error visibility to structural similarity

Image-to-Image Translation with Conditional Adversarial Networks

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

Related Papers (5)

Generative Adversarial Nets

Image quality assessment: from error visibility to structural similarity

Image-to-Image Translation with Conditional Adversarial Networks

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization