Image Transformer

Open AccessPosted Content

Image Transformer

Niki Parmar, +6 more

- 15 Feb 2018 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

In this article, a self-attention mechanism is used to attend to local neighborhoods to increase the size of images generated by the model, despite maintaining significantly larger receptive fields per layer than typical CNNs.

Abstract:

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By restricting the self-attention mechanism to attend to local neighborhoods we significantly increase the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks. While conceptually simple, our generative models significantly outperform the current state of the art in image generation on ImageNet, improving the best published negative log-likelihood on ImageNet from 3.83 to 3.77. We also present results on image super-resolution with a large magnification ratio, applying an encoder-decoder configuration of our architecture. In a human evaluation study, we find that images generated by our super-resolution model fool human observers three times more often than the previous state of the art.

Citations

PDF

Open Access

More filters

Posted Content

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, +11 more

- 22 Oct 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.

...read moreread less

Posted Content

Self-Attention Generative Adversarial Networks

Han Zhang, +3 more

- 21 May 2018 -

arXiv: Machine Learning

TL;DR: Self-Attention Generative Adversarial Network (SAGAN) as mentioned in this paper uses attention-driven, long-range dependency modeling for image generation tasks and achieves state-of-the-art results.

...read moreread less

Book ChapterDOI

End-to-End Object Detection with Transformers

Nicolas Carion, +5 more

TL;DR: DetR as mentioned in this paper proposes a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture to directly output the final set of predictions in parallel.

...read moreread less

Posted Content

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu, +5 more

- 08 Oct 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference, can achieve better performance than DETR (especially on small objects) with 10$\times less training epochs.

...read moreread less

Posted Content

Rethinking Attention with Performers

Krzysztof Choromanski, +12 more

- 30 Sep 2020 -

arXiv: Learning

TL;DR: Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear space and time complexity, without relying on any priors such as sparsity or low-rankness are introduced.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Posted Content

Attention Is All You Need

Ashish Vaswani, +7 more

- 12 Jun 2017 -

arXiv: Computation and Language

TL;DR: A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

Posted Content

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Christian Ledig, +10 more

- 15 Sep 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.

...read moreread less

Proceedings Article

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford, +2 more

TL;DR: Deep convolutional generative adversarial networks (DCGANs) as discussed by the authors learn a hierarchy of representations from object parts to scenes in both the generator and discriminator for unsupervised learning.

...read moreread less

Image Transformer

Citations

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Self-Attention Generative Adversarial Networks

End-to-End Object Detection with Transformers

Deformable DETR: Deformable Transformers for End-to-End Object Detection

Rethinking Attention with Performers

References

Adam: A Method for Stochastic Optimization

Dropout: a simple way to prevent neural networks from overfitting

Attention Is All You Need

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Related Papers (5)

Deep Residual Learning for Image Recognition

Attention is All you Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

ImageNet: A large-scale hierarchical image database

Neural Machine Translation by Jointly Learning to Align and Translate