Hierarchical Attention Networks for Document Classification

doi:10.18653/V1/N16-1174

Open AccessProceedings ArticleDOI

Hierarchical Attention Networks for Document Classification

Zichao Yang, +5 more

- pp 1480-1489

Chats0

TLDR

Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin.

Abstract:

We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics: (i) it has a hierarchical structure that mirrors the hierarchical structure of documents; (ii) it has two levels of attention mechanisms applied at the wordand sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation. Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Visualization of the attention layers illustrates that the model selects qualitatively informative words and sentences.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

OpenNMT: Open-Source Toolkit for Neural Machine Translation

Guillaume Klein, +4 more

TL;DR: The authors describe an open-source toolkit for neural machine translation (NMT) that prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities.

...read moreread less

Proceedings ArticleDOI

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Mandar Joshi, +3 more

TL;DR: It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.

...read moreread less

Proceedings ArticleDOI

Graph Neural Networks for Social Recommendation

Wenqi Fan, +6 more

TL;DR: This paper provides a principled approach to jointly capture interactions and opinions in the user-item graph and proposes the framework GraphRec, which coherently models two graphs and heterogeneous strengths for social recommendations.

...read moreread less

Proceedings ArticleDOI

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Fei Sun, +6 more

TL;DR: BERT4Rec as discussed by the authors employs the deep bidirectional self-attention to model user behavior sequences, predicting the random masked items in the sequence by jointly conditioning on their left and right context.

...read moreread less

Journal ArticleDOI

Deep learning for sentiment analysis: A survey

Lei Zhang, +2 more

- 01 Jul 2018 -

Wiley Interdisciplinary Reviews-Data Min...

TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Collapse

Related Papers (5)

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

Hierarchical Attention Networks for Document Classification

Citations

OpenNMT: Open-Source Toolkit for Neural Machine Translation

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Graph Neural Networks for Social Recommendation

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Deep learning for sentiment analysis: A survey

References

Long short-term memory

Gradient-based learning applied to document recognition

Distributed Representations of Words and Phrases and their Compositionality

Neural Machine Translation by Jointly Learning to Align and Translate

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

Long short-term memory

Glove: Global Vectors for Word Representation

Neural Machine Translation by Jointly Learning to Align and Translate

Adam: A Method for Stochastic Optimization

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding