Recurrent Attention Network on Memory for Aspect Sentiment Analysis

doi:10.18653/V1/D17-1047

Home
/
Papers
/
Recurrent Attention Network on Memory for Aspect Sentiment Analysis

Proceedings Article•DOI•

Recurrent Attention Network on Memory for Aspect Sentiment Analysis

Chen Peng, Sun Zhongqian, Lidong Bing¹, Wei Yang²•Institutions (2)

Tencent¹, Sichuan University²

01 Sep 2017-pp 452-461

TL;DR: A novel framework based on neural networks to identify the sentiment of opinion targets in a comment/review that adopts multiple-attention mechanism to capture sentiment features separated by a long distance, so that it is more robust against irrelevant information.

read less

Abstract: We propose a novel framework based on neural networks to identify the sentiment of opinion targets in a comment/review Our framework adopts multiple-attention mechanism to capture sentiment features separated by a long distance, so that it is more robust against irrelevant information The results of multiple attentions are non-linearly combined with a recurrent neural network, which strengthens the expressive power of our model for handling more complications The weighted-memory mechanism not only helps us avoid the labor-intensive feature engineering work, but also provides a tailor-made memory for different opinion targets of a sentence We examine the merit of our model on four datasets: two are from SemEval2014, ie reviews of restaurants and laptops; a twitter dataset, for testing its performance on social media data; and a Chinese news comment dataset, for testing its language sensitivity The experimental results show that our model consistently outperforms the state-of-the-art methods on different types of data

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep learning for sentiment analysis: A survey

[...]

Lei Zhang¹, Shuai Wang², Bing Liu²•Institutions (2)

LinkedIn¹, University of Illinois at Urbana–Champaign²

01 Jul 2018-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.

...read moreread less

Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

...read moreread less

917 citations

Proceedings Article•

Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM

[...]

Yukun Ma¹, Haiyun Peng¹, Erik Cambria¹•Institutions (1)

Nanyang Technological University¹

26 Apr 2018

TL;DR: A novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect- based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge by augmenting the LSTM network with a hierarchical attention mechanism.

...read moreread less

Abstract: Analyzing people’s opinions and sentiments towards certain aspects is an important task of natural language understanding. In this paper, we propose a novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect-based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge. We augment the long short-term memory (LSTM) network with a hierarchical attention mechanism consisting of a target-level attention and a sentence-level attention. Commonsense knowledge of sentiment-related concepts is incorporated into the end-to-end training of a deep neural network for sentiment classification. In order to tightly integrate the commonsense knowledge into the recurrent encoder, we propose an extension of LSTM, termed Sentic LSTM. We conduct experiments on two publicly released datasets, which show that the combination of the proposed attention architecture and Sentic LSTM can outperform state-of-the-art methods in targeted aspect sentiment tasks.

...read moreread less

491 citations

Cites background from "Recurrent Attention Network on Memo..."

...Rather than using a single level of attention, deep memory networks (Tang, Qin, and Liu 2016) and recurrent attention models (Chen et al. 2017) have achieved superior performance by learning a deep attention over the singlelevel attention, as multiple passes (or hops) over the input sequence could…...
[...]
...Rather than using a single level of attention, deep memory networks (Tang, Qin, and Liu 2016) and recurrent attention models (Chen et al. 2017) have achieved superior performance by learning a deep attention over the singlelevel attention, as multiple passes (or hops) over the input sequence could refine the attended words again and again to find the most important words....
[...]

Proceedings Article•DOI•

Aspect Based Sentiment Analysis with Gated Convolutional Networks

[...]

Wei Xue¹, Tao Li²•Institutions (2)

Florida International University¹, Shandong University²

01 Jul 2018

TL;DR: Wang et al. as mentioned in this paper proposed a model based on convolutional neural networks and gating mechanisms, which can selectively output the sentiment features according to the given aspect or entity, and the computations of their model could be easily parallelized during training.

...read moreread less

Abstract: Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models.

...read moreread less

417 citations

Journal Article•DOI•

Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review

[...]

Hai Ha Do¹, P. W. C. Prasad¹, Angelika Maag¹, Abeer Alsadoon¹•Institutions (1)

Charles Sturt University¹

15 Mar 2019-Expert Systems With Applications

TL;DR: This article aims to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.

...read moreread less

Abstract: The increasing volume of user-generated content on the web has made sentiment analysis an important tool for the extraction of information about the human emotional state. A current research focus for sentiment analysis is the improvement of granularity at aspect level, representing two distinct aims: aspect extraction and sentiment classification of product reviews and sentiment classification of target-dependent tweets. Deep learning approaches have emerged as a prospect for achieving these aims with their ability to capture both syntactic and semantic features of text without requirements for high-level feature engineering, as is the case in earlier methods. In this article, we aim to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.

...read moreread less

388 citations

Cites background or methods from "Recurrent Attention Network on Memo..."

...Thus, it can be inferred that CRFs can take advantage of the entire sentence sequence to estimate probability for the sentence labelling making CRF a frequent final classification layer of bidirectional RNNs (T. Chen et al., 2017; Irsoy & Cardie, 2014; Lample et al., 2016; P. Liu et al., 2015)....
[...]
...Chen et al. (2017) Restaurant SemEval '16 English BiLSTM + Google WE + CRF F1: 72.44% Restaurant SemEval '16 Spanish F1: 71.70% Restaurant SemEval '16 French F1: 73.50% Restaurant SemEval '16 Russian F1: 67.08% Restaurant SemEval '16 Dutch F1: 64.29 % Restaurant SemEval '16 Turkish F1: 63.76% 3 Liu et al. (2015) Laptop SemEval '14 English LSTM-RNN+ POS + chunk + Amazon WE F1: 75....
[...]
...Chen et al. (2016) also combined LSTM and CNN together for sentiment classification but used LST for generating context embedding and CNN for detecting features....
[...]
...Chen et al. (2017) Twitter data Dong et al. (2014) English Recurrent Attention on Memory (RAM) + attention layers Acc: 69....
[...]
...Chen et al. (2017) and Tay, Tuan, et al. (2017) also focused on attention mechanisms for the LSTM to incorporate aspect information into the model. While P. Chen et al. (2017) adopted a multiple-attention mechanism, Tay, Tuan, et al. (2017) introduced a novel association layer with holographic reduced representation....
[...]

Book Chapter•DOI•

Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text

[...]

Saif M. Mohammad¹•Institutions (1)

National Research Council¹

01 Jan 2016

TL;DR: Sentiment analysis is the task of automatically determining from text the attitude, emotion, or some other affectual state of the author as mentioned in this paper, which is a difficult task due to the complexity and subtlety of language use.

...read moreread less

Abstract: Sentiment analysis is the task of automatically determining from text the attitude, emotion, or some other affectual state of the author. This chapter summarizes the diverse landscape of tasks and applications associated with sentiment analysis. We outline key challenges stemming from the complexity and subtlety of language use, the prevalence of creative and non-standard language, and the lack of paralinguistic information, such as tone and stress markers. We describe automatic systems and datasets commonly used in sentiment analysis. We summarize several manual and automatic approaches to creating valence- and emotion-association lexicons. We also discuss preliminary approaches for sentiment composition (how smaller units of text combine to express sentiment) and approaches for detecting sentiment in figurative and metaphoric language—these are the areas where we expect to see significant work in the near future.

...read moreread less

315 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Glove: Global Vectors for Word Representation

[...]

Jeffrey Pennington¹, Richard Socher², Christopher D. Manning¹•Institutions (2)

Stanford University¹, University of Colorado Boulder²

01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

...read moreread less

30,558 citations

"Recurrent Attention Network on Memo..." refers background or methods in this paper

...We use 300-dimension word vectors pre-trained by GloVe (Pennington et al., 2014) (whose vocabulary size is 1.9M2) for our experiments on the English datasets, as previous works did (Tang et al., 2016)....
[...]
...the general embeddings from (Pennington et al., 2014) for all datasets, so that the experimental results can better reveal the model’s capability and...
[...]
...We use 300-dimension word vectors pre-trained by GloVe (Pennington et al., 2014) (whose vocabulary size is 1....
[...]
...In contrast, we prefer to use 2http://nlp.stanford.edu/projects/glove/ the general embeddings from (Pennington et al., 2014) for all datasets, so that the experimental results can better reveal the model’s capability and the figures are directly comparable across different papers....
[...]
...Let L ∈ Rd×|V | be an embedding lookup table generated by an unsupervised method such as GloVe (Pennington et al., 2014) or CBOW...
[...]

Posted Content•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013-arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

...read moreread less

20,077 citations

Proceedings Article•

Neural Machine Translation by Jointly Learning to Align and Translate

[...]

Dzmitry Bahdanau¹, Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Jacobs University Bremen¹, Université de Montréal²

01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

...read moreread less

20,027 citations

"Recurrent Attention Network on Memo..." refers background or methods in this paper

...Attention mechanism, which has been used successfully in many areas (Bahdanau et al., 2014; Rush et al., 2015), can be treated as a simplified version of NTM because the size of memory is unlimited and we only need to read from it....
[...]
...…feeling is that the phone, after using it for three months and considering its price, is really cost-effective”.1 Attention mechanism, which has been successfully used in machine translation (Bahdanau et al., 2014), can enforce a model to pay more attention to the important part of a sentence....
[...]
...the states of time steps generated by LSTM) from the input, as bidirectional recurrent neural networks (RNNs) were found effective for a similar purpose in machine translation (Bahdanau et al., 2014)....
[...]
...Specifically, our framework first adopts a bidirectional LSTM (BLSTM) to produce the memory (i.e. the states of time steps generated by LSTM) from the input, as bidirectional recurrent neural networks (RNNs) were found effective for a similar purpose in machine translation (Bahdanau et al., 2014)....
[...]
...1 Attention mechanism, which has been successfully used in machine translation (Bahdanau et al., 2014), can enforce a model to pay more attention to the important part of a sentence....
[...]

Posted Content•

Neural Machine Translation by Jointly Learning to Align and Translate

[...]

Dzmitry Bahdanau¹, Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Jacobs University Bremen¹, Université de Montréal²

01 Sep 2014-arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

14,077 citations

Proceedings Article•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

9,270 citations

"Recurrent Attention Network on Memo..." refers background or methods in this paper

...(Mikolov et al., 2013), where d is the dimension of word vectors and |V | is the vocabulary size....
[...]
...The embeddings for Chinese experiments are trained with a corpus of 1.4 billion tokens with CBOW3....
[...]
...Let L ∈ Rd×|V | be an embedding lookup table generated by an unsupervised method such as GloVe (Pennington et al., 2014) or CBOW (Mikolov et al., 2013), where d is the dimension of word vectors and |V | is the vocabulary size....
[...]