scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Recurrent Attention Network on Memory for Aspect Sentiment Analysis

01 Sep 2017-pp 452-461
TL;DR: A novel framework based on neural networks to identify the sentiment of opinion targets in a comment/review that adopts multiple-attention mechanism to capture sentiment features separated by a long distance, so that it is more robust against irrelevant information.
Abstract: We propose a novel framework based on neural networks to identify the sentiment of opinion targets in a comment/review Our framework adopts multiple-attention mechanism to capture sentiment features separated by a long distance, so that it is more robust against irrelevant information The results of multiple attentions are non-linearly combined with a recurrent neural network, which strengthens the expressive power of our model for handling more complications The weighted-memory mechanism not only helps us avoid the labor-intensive feature engineering work, but also provides a tailor-made memory for different opinion targets of a sentence We examine the merit of our model on four datasets: two are from SemEval2014, ie reviews of restaurants and laptops; a twitter dataset, for testing its performance on social media data; and a Chinese news comment dataset, for testing its language sensitivity The experimental results show that our model consistently outperforms the state-of-the-art methods on different types of data

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.
Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

917 citations

Proceedings Article
26 Apr 2018
TL;DR: A novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect- based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge by augmenting the LSTM network with a hierarchical attention mechanism.
Abstract: Analyzing people’s opinions and sentiments towards certain aspects is an important task of natural language understanding. In this paper, we propose a novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect-based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge. We augment the long short-term memory (LSTM) network with a hierarchical attention mechanism consisting of a target-level attention and a sentence-level attention. Commonsense knowledge of sentiment-related concepts is incorporated into the end-to-end training of a deep neural network for sentiment classification. In order to tightly integrate the commonsense knowledge into the recurrent encoder, we propose an extension of LSTM, termed Sentic LSTM. We conduct experiments on two publicly released datasets, which show that the combination of the proposed attention architecture and Sentic LSTM can outperform state-of-the-art methods in targeted aspect sentiment tasks.

491 citations


Cites background from "Recurrent Attention Network on Memo..."

  • ...Rather than using a single level of attention, deep memory networks (Tang, Qin, and Liu 2016) and recurrent attention models (Chen et al. 2017) have achieved superior performance by learning a deep attention over the singlelevel attention, as multiple passes (or hops) over the input sequence could…...

    [...]

  • ...Rather than using a single level of attention, deep memory networks (Tang, Qin, and Liu 2016) and recurrent attention models (Chen et al. 2017) have achieved superior performance by learning a deep attention over the singlelevel attention, as multiple passes (or hops) over the input sequence could refine the attended words again and again to find the most important words....

    [...]

Proceedings ArticleDOI
01 Jul 2018
TL;DR: Wang et al. as mentioned in this paper proposed a model based on convolutional neural networks and gating mechanisms, which can selectively output the sentiment features according to the given aspect or entity, and the computations of their model could be easily parallelized during training.
Abstract: Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models.

417 citations

Journal ArticleDOI
TL;DR: This article aims to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.
Abstract: The increasing volume of user-generated content on the web has made sentiment analysis an important tool for the extraction of information about the human emotional state. A current research focus for sentiment analysis is the improvement of granularity at aspect level, representing two distinct aims: aspect extraction and sentiment classification of product reviews and sentiment classification of target-dependent tweets. Deep learning approaches have emerged as a prospect for achieving these aims with their ability to capture both syntactic and semantic features of text without requirements for high-level feature engineering, as is the case in earlier methods. In this article, we aim to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.

388 citations


Cites background or methods from "Recurrent Attention Network on Memo..."

  • ...Thus, it can be inferred that CRFs can take advantage of the entire sentence sequence to estimate probability for the sentence labelling making CRF a frequent final classification layer of bidirectional RNNs (T. Chen et al., 2017; Irsoy & Cardie, 2014; Lample et al., 2016; P. Liu et al., 2015)....

    [...]

  • ...Chen et al. (2017) Restaurant SemEval '16 English BiLSTM + Google WE + CRF F1: 72.44% Restaurant SemEval '16 Spanish F1: 71.70% Restaurant SemEval '16 French F1: 73.50% Restaurant SemEval '16 Russian F1: 67.08% Restaurant SemEval '16 Dutch F1: 64.29 % Restaurant SemEval '16 Turkish F1: 63.76% 3 Liu et al. (2015) Laptop SemEval '14 English LSTM-RNN+ POS + chunk + Amazon WE F1: 75....

    [...]

  • ...Chen et al. (2016) also combined LSTM and CNN together for sentiment classification but used LST for generating context embedding and CNN for detecting features....

    [...]

  • ...Chen et al. (2017) Twitter data Dong et al. (2014) English Recurrent Attention on Memory (RAM) + attention layers Acc: 69....

    [...]

  • ...Chen et al. (2017) and Tay, Tuan, et al. (2017) also focused on attention mechanisms for the LSTM to incorporate aspect information into the model. While P. Chen et al. (2017) adopted a multiple-attention mechanism, Tay, Tuan, et al. (2017) introduced a novel association layer with holographic reduced representation....

    [...]

Book ChapterDOI
01 Jan 2016
TL;DR: Sentiment analysis is the task of automatically determining from text the attitude, emotion, or some other affectual state of the author as mentioned in this paper, which is a difficult task due to the complexity and subtlety of language use.
Abstract: Sentiment analysis is the task of automatically determining from text the attitude, emotion, or some other affectual state of the author. This chapter summarizes the diverse landscape of tasks and applications associated with sentiment analysis. We outline key challenges stemming from the complexity and subtlety of language use, the prevalence of creative and non-standard language, and the lack of paralinguistic information, such as tone and stress markers. We describe automatic systems and datasets commonly used in sentiment analysis. We summarize several manual and automatic approaches to creating valence- and emotion-association lexicons. We also discuss preliminary approaches for sentiment composition (how smaller units of text combine to express sentiment) and approaches for detecting sentiment in figurative and metaphoric language—these are the areas where we expect to see significant work in the near future.

315 citations

References
More filters
Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations


"Recurrent Attention Network on Memo..." refers background or methods in this paper

  • ...We use 300-dimension word vectors pre-trained by GloVe (Pennington et al., 2014) (whose vocabulary size is 1.9M2) for our experiments on the English datasets, as previous works did (Tang et al., 2016)....

    [...]

  • ...the general embeddings from (Pennington et al., 2014) for all datasets, so that the experimental results can better reveal the model’s capability and...

    [...]

  • ...We use 300-dimension word vectors pre-trained by GloVe (Pennington et al., 2014) (whose vocabulary size is 1....

    [...]

  • ...In contrast, we prefer to use 2http://nlp.stanford.edu/projects/glove/ the general embeddings from (Pennington et al., 2014) for all datasets, so that the experimental results can better reveal the model’s capability and the figures are directly comparable across different papers....

    [...]

  • ...Let L ∈ Rd×|V | be an embedding lookup table generated by an unsupervised method such as GloVe (Pennington et al., 2014) or CBOW...

    [...]

Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

Proceedings Article
01 Jan 2015
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

20,027 citations


"Recurrent Attention Network on Memo..." refers background or methods in this paper

  • ...Attention mechanism, which has been used successfully in many areas (Bahdanau et al., 2014; Rush et al., 2015), can be treated as a simplified version of NTM because the size of memory is unlimited and we only need to read from it....

    [...]

  • ...…feeling is that the phone, after using it for three months and considering its price, is really cost-effective”.1 Attention mechanism, which has been successfully used in machine translation (Bahdanau et al., 2014), can enforce a model to pay more attention to the important part of a sentence....

    [...]

  • ...the states of time steps generated by LSTM) from the input, as bidirectional recurrent neural networks (RNNs) were found effective for a similar purpose in machine translation (Bahdanau et al., 2014)....

    [...]

  • ...Specifically, our framework first adopts a bidirectional LSTM (BLSTM) to produce the memory (i.e. the states of time steps generated by LSTM) from the input, as bidirectional recurrent neural networks (RNNs) were found effective for a similar purpose in machine translation (Bahdanau et al., 2014)....

    [...]

  • ...1 Attention mechanism, which has been successfully used in machine translation (Bahdanau et al., 2014), can enforce a model to pay more attention to the important part of a sentence....

    [...]

Posted Content
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

14,077 citations

Proceedings Article
16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

9,270 citations


"Recurrent Attention Network on Memo..." refers background or methods in this paper

  • ...(Mikolov et al., 2013), where d is the dimension of word vectors and |V | is the vocabulary size....

    [...]

  • ...The embeddings for Chinese experiments are trained with a corpus of 1.4 billion tokens with CBOW3....

    [...]

  • ...Let L ∈ Rd×|V | be an embedding lookup table generated by an unsupervised method such as GloVe (Pennington et al., 2014) or CBOW (Mikolov et al., 2013), where d is the dimension of word vectors and |V | is the vocabulary size....

    [...]