An Attentive Survey of Attention Models

Open AccessPosted Content

An Attentive Survey of Attention Models

Sneha Chaudhari, +3 more

- 05 Apr 2019 -

arXiv: Learning

Chats0

TLDR

A taxonomy that groups existing techniques into coherent categories in attention models is proposed, and how attention has been used to improve the interpretability of neural networks is described.

Abstract:

Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy which groups existing techniques into coherent categories. We review salient neural architectures in which attention has been incorporated, and discuss applications in which modeling attention has shown a significant impact. We also describe how attention has been used to improve the interpretability of neural networks. Finally, we discuss some future research directions in attention. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Survey of Deep Learning-Based Object Detection

Licheng Jiao, +6 more

- 05 Sep 2019 -

IEEE Access

TL;DR: This survey provides a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors, and lists the traditional and new applications.

...read moreread less

Journal ArticleDOI

A review on the attention mechanism of deep learning

Zhaoyang Niu, +2 more

- 10 Sep 2021 -

Neurocomputing

TL;DR: An overview of the state-of-the-art attention models proposed in recent years is given and a unified model that is suitable for most attention structures is defined.

...read moreread less

Journal ArticleDOI

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions.

Iqbal H. Sarker, +1 more

TL;DR: In this paper, the authors present a structured and comprehensive view on deep learning techniques including a taxonomy considering various types of real-world tasks like supervised or unsupervised, and point out ten potential aspects for future generation DL modeling with research directions.

...read moreread less

Posted Content

Attention Mechanisms in Computer Vision: A Survey.

Meng-Hao Guo, +9 more

- 15 Nov 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A comprehensive review of attention mechanisms in computer vision can be found in this article, which categorizes them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.

...read moreread less

Posted Content

Transformers in Vision: A Survey

Salman H. Khan, +5 more

- 04 Jan 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Transformer networks as mentioned in this paper enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks e.g., Long short-term memory (LSTM).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Collapse

An Attentive Survey of Attention Models

Citations

A Survey of Deep Learning-Based Object Detection

A review on the attention mechanism of deep learning

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions.

Attention Mechanisms in Computer Vision: A Survey.

Transformers in Vision: A Survey

References

Attention is All you Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Neural Machine Translation by Jointly Learning to Align and Translate

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

Long short-term memory

Deep Residual Learning for Image Recognition

Attention is All you Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Adam: A Method for Stochastic Optimization