scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Using a stacked residual LSTM model for sentiment intensity prediction

17 Dec 2018-Neurocomputing (Elsevier)-Vol. 322, pp 93-101
TL;DR: A stacked residual LSTM model to predict sentiment intensity for a given text that outperforms lexicon- and regression-based methods proposed in previous studies and makes the deeper network easier to optimize.
About: This article is published in Neurocomputing.The article was published on 2018-12-17. It has received 117 citations till now. The article focuses on the topics: Word embedding & Residual.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis and presents a taxonomy of sentiment analysis, which highlights the power of deep learning architectures for solving sentiment analysis problems.
Abstract: Social media is a powerful source of communication among people to share their sentiments in the form of opinions and views about any topic or article, which results in an enormous amount of unstructured information. Business organizations need to process and study these sentiments to investigate data and to gain business insights. Hence, to analyze these sentiments, various machine learning, and natural language processing-based approaches have been used in the past. However, deep learning-based methods are becoming very popular due to their high performance in recent times. This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis. We present a taxonomy of sentiment analysis and discuss the implications of popular deep learning architectures. The key contributions of various researchers are highlighted with the prime focus on deep learning approaches. The crucial sentiment analysis tasks are presented, and multiple languages are identified on which sentiment analysis is done. The survey also summarizes the popular datasets, key features of the datasets, deep learning model applied on them, accuracy obtained from them, and the comparison of various deep learning models. The primary purpose of this survey is to highlight the power of deep learning architectures for solving sentiment analysis problems.

385 citations


Cites background or methods from "Using a stacked residual LSTM model..."

  • ...The sentiment models are trained by minimizing the MSE (Jiang et al. 2014; Wang et al. 2018c)....

    [...]

  • ...Sentiment analysis using BRNN is reported in Chen et al. (2017b), Baktha and Tripathy (2017), Poria et al. (2017b) and Wang et al. (2018a)....

    [...]

  • ...Wang et al. (2018b) proposed RNN based capsule networks by building capsules for each sentiment category....

    [...]

Journal ArticleDOI
TL;DR: Empirical analysis indicate that deep learning‐based architectures outperform ensemble learning methods and supervised learning methods for the task of sentiment analysis on educational data mining.
Abstract: Massive open online courses (MOOCs) are recent innovative approaches in distance education, which provide learning content to participants without age‐, gender‐, race‐, or geography‐related barriers. The purpose of our research is to present an efficient sentiment classification scheme with high predictive performance in MOOC reviews, by pursuing the paradigms of ensemble learning and deep learning. In this contribution, we seek to answer several research questions on sentiment analysis on educational data. First, the predictive performance of conventional supervised learning methods, ensemble learning methods and deep learning methods has been evaluated. Besides, the efficiency of text representation schemes and word‐embedding schemes has been evaluated for sentiment analysis on MOOC evaluations. For the evaluation task, we have analyzed a corpus containing 66,000 MOOC reviews, with the use of machine learning, ensemble learning, and deep learning methods. The empirical analysis indicate that deep learning‐based architectures outperform ensemble learning methods and supervised learning methods for the task of sentiment analysis on educational data mining. For all the compared configurations, the highest predictive performance has been achieved by long short‐term memory networks in conjunction with GloVe word‐embedding scheme‐based representation, with a classification accuracy of 95.80%.

148 citations


Cites background from "Using a stacked residual LSTM model..."

  • ...Similarly, Wang et al [69] introduced a stacked residual long short‐term memory‐based architecture to identify sentiment intensity of text documents....

    [...]

Journal ArticleDOI
TL;DR: The experimental results show that this method can predict the remaining life of gears and bearings well, and it has higher prediction accuracy than the conventional prediction methods.
Abstract: In the mechanical transmission system, the gear is one of the most widely used transmission components. The failure of the gear will cause serious accidents and huge economic loss. Therefore, the remaining life prediction of the gear is of great importance. In order to accurately predict the remaining life of the gear, a new type of long-short-term memory neural network with macroscopic–microscopic attention (MMA) is proposed in this article. First, some typical time-domain and frequency-domain characteristics of vibration signals are calculated, respectively, such as the maximum value, the absolute mean value, the standard deviation, the kurtosis, and so on. Then, the principal component of these characteristics is extracted by the isometric mapping method. The importance of fusional characteristic information is filtered via a proposed MMA mechanism so that the input weight of neural network data and recursive data can reach multilevel real-time amplification. With the new long short-term memory neural network, the health characteristics of gear vibration signals can be predicted based on the known fusion features. The experimental results show that this method can predict the remaining life of gears and bearings well, and it has higher prediction accuracy than the conventional prediction methods.

129 citations


Additional excerpts

  • ...[22] proposed the residual LSTM network...

    [...]

Journal ArticleDOI
TL;DR: With the monitoring data of a gear life cycle test, the comparative experiments show that the proposed gear remaining life prediction method has higher prediction accuracy than the conventional prediction methods.

94 citations

Posted Content
TL;DR: It is shown that capsule networks indeed have the potential for text classification and that they have several advantages over convolutional neural networks, and a simple routing method is suggested that effectively reduces the computational complexity of dynamic routing.
Abstract: This paper presents an empirical exploration of the use of capsule networks for text classification. While it has been shown that capsule networks are effective for image classification, their validity in the domain of text has not been explored. In this paper, we show that capsule networks indeed have the potential for text classification and that they have several advantages over convolutional neural networks. We further suggest a simple routing method that effectively reduces the computational complexity of dynamic routing. We utilized seven benchmark datasets to demonstrate that capsule networks, along with the proposed routing method provide comparable results.

81 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal Article
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

33,597 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations

Proceedings Article
Tomas Mikolov1, Ilya Sutskever1, Kai Chen1, Greg S. Corrado1, Jeffrey Dean1 
05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

24,012 citations