Hierarchical Attention Networks for Document Classification
Zichao Yang,Diyi Yang,Chris Dyer,Xiaodong He,Alexander J. Smola,Eduard Hovy +5 more
- pp 1480-1489
Reads0
Chats0
TLDR
Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin.Abstract:
We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics: (i) it has a hierarchical structure that mirrors the hierarchical structure of documents; (ii) it has two levels of attention mechanisms applied at the wordand sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation. Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Visualization of the attention layers illustrates that the model selects qualitatively informative words and sentences.read more
Citations
More filters
Proceedings ArticleDOI
OpenNMT: Open-Source Toolkit for Neural Machine Translation
TL;DR: The authors describe an open-source toolkit for neural machine translation (NMT) that prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities.
Proceedings ArticleDOI
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
TL;DR: It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.
Proceedings ArticleDOI
Graph Neural Networks for Social Recommendation
TL;DR: This paper provides a principled approach to jointly capture interactions and opinions in the user-item graph and proposes the framework GraphRec, which coherently models two graphs and heterogeneous strengths for social recommendations.
Proceedings ArticleDOI
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
TL;DR: BERT4Rec as discussed by the authors employs the deep bidirectional self-attention to model user behavior sequences, predicting the random masked items in the sequence by jointly conditioning on their left and right context.
Journal ArticleDOI
Deep learning for sentiment analysis: A survey
Lei Zhang,Shuai Wang,Bing Liu +2 more
TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.