Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction

doi:10.18653/V1/D17-1310

Home
/
Papers
/
Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction

Proceedings Article•DOI•

Deep Multi-Task Learning for Aspect Term Extraction with Memory Interaction

Xin Li¹, Wai Lam²•Institutions (2)

Harvard University¹, The Chinese University of Hong Kong²

01 Sep 2017-pp 2886-2892

TL;DR: A novel LSTM-based deep multi-task learning framework for aspect term extraction from user review sentences designed for jointly handling the extraction tasks of aspects and opinions via memory interactions is proposed.

read less

Abstract: We propose a novel LSTM-based deep multi-task learning framework for aspect term extraction from user review sentences. Two LSTMs equipped with extended memories and neural memory operations are designed for jointly handling the extraction tasks of aspects and opinions via memory interactions. Sentimental sentence constraint is also added for more accurate prediction via another LSTM. Experiment results over two benchmark datasets demonstrate the effectiveness of our framework.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep learning for sentiment analysis: A survey

[...]

Lei Zhang¹, Shuai Wang², Bing Liu²•Institutions (2)

LinkedIn¹, University of Illinois at Urbana–Champaign²

01 Jul 2018-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.

...read moreread less

Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

...read moreread less

917 citations

Journal Article•DOI•

Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review

[...]

Hai Ha Do¹, P. W. C. Prasad¹, Angelika Maag¹, Abeer Alsadoon¹•Institutions (1)

Charles Sturt University¹

15 Mar 2019-Expert Systems With Applications

TL;DR: This article aims to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.

...read moreread less

Abstract: The increasing volume of user-generated content on the web has made sentiment analysis an important tool for the extraction of information about the human emotional state. A current research focus for sentiment analysis is the improvement of granularity at aspect level, representing two distinct aims: aspect extraction and sentiment classification of product reviews and sentiment classification of target-dependent tweets. Deep learning approaches have emerged as a prospect for achieving these aims with their ability to capture both syntactic and semantic features of text without requirements for high-level feature engineering, as is the case in earlier methods. In this article, we aim to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.

...read moreread less

388 citations

Cites background or methods from "Deep Multi-Task Learning for Aspect..."

...In recent DNN models, word embeddings are typically pre-trained but not task-specific data so that the learning word vectors can capture general syntactical and semantic information (T. Chen, Xu, He, & Wang, 2017; P. Liu et al., 2015; Poria, Cambria, et al., 2016)....
[...]
...An example comes from Xu, Liu, Wang, & Yin, (2018) who attempted to approach three ABSA tasks with CNN models but achieved lower outcomes than with the SVM approach. Yuan et al. (2017) found that a purely window-based neural network produces outcomes that are comparable to an LSTM-RNN approach, and concluded that local context rather than longterm dependencies were important for aspect extraction....
[...]
...Sentiment polarity 3 Tang, Qin, Feng, & Liu (2015) Twitter data Dong et al....
[...]
...The most extensively used classifiers in recent years include Support Vector Machine (SVM) and Conditional Random Fields (CRF) classifiers, with examples in ABSA tasks as CRF in T. Chen et al. (2017), Xu, Lin, Wang, Yin and Wang (2017), Mai & Le (2018), or SMV in Akhtar, Kumar, Ekbal and Bhattacharyya (2016), and Dong et al. (2014)....
[...]
...2 Li and Lam (2017) English Memory Interaction Network (MIN) based on LSTM with extended memory 73....
[...]

Proceedings Article•DOI•

Transformation Networks for Target-Oriented Sentiment Classification

[...]

Xin Li¹, Lidong Bing², Wai Lam³, Bei Shi³•Institutions (3)

Tsinghua University¹, Tencent², The Chinese University of Hong Kong³

03 May 2018

TL;DR: The authors proposed a new model that employs a CNN layer to extract salient features from the transformed word representations originated from a bi-directional RNN layer, which achieved state-of-the-art performance.

...read moreread less

Abstract: Target-oriented sentiment classification aims at classifying sentiment polarities over individual opinion targets in a sentence. RNN with attention seems a good fit for the characteristics of this task, and indeed it achieves the state-of-the-art performance. After re-examining the drawbacks of attention mechanism and the obstacles that block CNN to perform well in this classification task, we propose a new model that achieves new state-of-the-art results on a few benchmarks. Instead of attention, our model employs a CNN layer to extract salient features from the transformed word representations originated from a bi-directional RNN layer. Between the two layers, we propose a component which first generates target-specific representations of words in the sentence, and then incorporates a mechanism for preserving the original contextual information from the RNN layer.

...read moreread less

282 citations

Proceedings Article•DOI•

Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction

[...]

Hu Xu¹, Bing Liu², Lei Shu³, Philip S. Yu¹•Institutions (3)

University of Illinois at Chicago¹, University of Illinois at Urbana–Champaign², Harbin Institute of Technology³

01 Jul 2018

TL;DR: The authors proposed a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embedding and domain-specific embedding, which achieves surprisingly good results, outperforming state-of-theart sophisticated existing methods.

...read moreread less

Abstract: One key task of fine-grained sentiment analysis of product reviews is to extract product aspects or features that users have expressed opinions on. This paper focuses on supervised aspect extraction using deep learning. Unlike other highly sophisticated supervised deep learning models, this paper proposes a novel and yet simple CNN model employing two types of pre-trained embeddings for aspect extraction: general-purpose embeddings and domain-specific embeddings. Without using any additional supervision, this model achieves surprisingly good results, outperforming state-of-the-art sophisticated existing methods. To our knowledge, this paper is the first to report such double embeddings based CNN model for aspect extraction and achieve very good results.

...read moreread less

261 citations

Proceedings Article•DOI•

Target-oriented Opinion Words Extraction with Target-fused Neural Sequence Labeling

[...]

Zhifang Fan, Zhen Wu¹, Xinyu Dai¹, Shujian Huang¹, Jiajun Chen² - Show less +1 more•Institutions (2)

Nanjing University¹, University of Waterloo²

01 Jun 2019

TL;DR: This paper proposes a novel sequence labeling subtask for ABSA named TOWE (Target-oriented Opinion Words Extraction), which aims at extracting the corresponding opinion words for a given opinion target through a target-fused sequence labeling neural network model.

...read moreread less

Abstract: Opinion target extraction and opinion words extraction are two fundamental subtasks in Aspect Based Sentiment Analysis (ABSA). Recently, many methods have made progress on these two tasks. However, few works aim at extracting opinion targets and opinion words as pairs. In this paper, we propose a novel sequence labeling subtask for ABSA named TOWE (Target-oriented Opinion Words Extraction), which aims at extracting the corresponding opinion words for a given opinion target. A target-fused sequence labeling neural network model is designed to perform this task. The opinion target information is well encoded into context by an Inward-Outward LSTM. Then left and right contexts of the opinion target and the global context are combined to find the corresponding opinion words. We build four datasets for TOWE based on several popular ABSA benchmarks from laptop and restaurant reviews. The experimental results show that our proposed model outperforms the other compared methods significantly. We believe that our work may not only be helpful for downstream sentiment analysis task, but can also be used for pair-wise opinion summarization.

...read moreread less

173 citations

Cites methods from "Deep Multi-Task Learning for Aspect..."

...This co-extraction strategy can also be adopted in neural networks with multi-task learning (Wang et al., 2016, 2017; Li and Lam, 2017)....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

"Deep Multi-Task Learning for Aspect..." refers background in this paper

...Liu et al. (2014) modeled relation between aspects and opinions by constructing a bipartite heterogenous graph....
[...]

Proceedings Article•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

Google¹

05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

24,012 citations

"Deep Multi-Task Learning for Aspect..." refers methods in this paper

...For datasets in the restaurant domain, we train word embeddings of dimension 200 with word2vec (Mikolov et al., 2013) on Yelp reviews5....
[...]

Proceedings Article•

Understanding the difficulty of training deep feedforward neural networks

[...]

Xavier Glorot¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

31 Mar 2010

TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

...read moreread less

Abstract: Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. 1 Deep Neural Networks Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. They include Appearing in Proceedings of the 13 International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: WC Weston et al., 2008). Much attention has recently been devoted to them (see (Bengio, 2009) for a review), because of their theoretical appeal, inspiration from biology and human cognition, and because of empirical success in vision (Ranzato et al., 2007; Larochelle et al., 2007; Vincent et al., 2008) and natural language processing (NLP) (Collobert & Weston, 2008; Mnih & Hinton, 2009). Theoretical results reviewed and discussed by Bengio (2009), suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Most of the recent experimental results with deep architecture are obtained with models that can be turned into deep supervised neural networks, but with initialization or training schemes different from the classical feedforward neural networks (Rumelhart et al., 1986). Why are these new algorithms working so much better than the standard random initialization and gradient-based optimization of a supervised training criterion? Part of the answer may be found in recent analyses of the effect of unsupervised pretraining (Erhan et al., 2009), showing that it acts as a regularizer that initializes the parameters in a “better” basin of attraction of the optimization procedure, corresponding to an apparent local minimum associated with better generalization. But earlier work (Bengio et al., 2007) had shown that even a purely supervised but greedy layer-wise procedure would give better results. So here instead of focusing on what unsupervised pre-training or semi-supervised criteria bring to deep architectures, we focus on analyzing what may be going wrong with good old (but deep) multilayer neural networks. Our analysis is driven by investigative experiments to monitor activations (watching for saturation of hidden units) and gradients, across layers and across training iterations. We also evaluate the effects on these of choices of activation function (with the idea that it might affect saturation) and initialization procedure (since unsupervised pretraining is a particular form of initialization and it has a drastic impact).

...read moreread less

9,500 citations

Proceedings Article•DOI•

Mining and summarizing customer reviews

[...]

Minqing Hu¹, Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

22 Aug 2004

TL;DR: This research aims to mine and to summarize all the customer reviews of a product, and proposes several novel techniques to perform these tasks.

...read moreread less

Abstract: Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

...read moreread less

7,330 citations

Book•

Sentiment Analysis and Opinion Mining

[...]

Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

01 May 2012

TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.

...read moreread less

Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

...read moreread less

4,515 citations