scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Sentiment Information Collector-Extractor Architecture Based Neural Network for Sentiment Analysis

TL;DR: A new ensemble strategy is applied to combine the results of different sub-extractors, making the SIE more universal and outperform any single sub- Extractor and outperforms the state-of-the-art methods on three datasets of different language.
About: This article is published in Information Sciences.The article was published on 2018-10-01 and is currently open access. It has received 21 citations till now. The article focuses on the topics: Sentiment analysis & Deep learning.

Summary (3 min read)

1. Introduction

  • Deep learning has made a great progress recently and plays an important role in academia and industry.
  • This key word appears in two completely different positions.
  • Besides, sentence (iii) contains two key words not and pleasant and they are separated by another word been.
  • How to locate the key words remains a big challenge in sentiment analysis.30 Researchers have designed many efficient models in order to capture the sentiment information.
  • Thus, it could reduce the effectiveness40 when RNN is used to capture the semantics of a whole sentence, because key components could appear anywhere in a sentence rather than at the end.

3. Model

  • Figure 1 shows the architecture of the whole model.
  • As is illustrated in Figure 1, the model can be divided into two part: (i) SIC and (ii) SIE.
  • Then the matrix X is fed into information extractor and latent semantic information will be extracted based on model ensemble strategy.

3.1. Sentiment Information Collector (SIC)

  • The authors first describe the architecture of the SIC in their model.
  • The left-side context cl(vi) of word vi is calculated using Equation(1), where e(vi) is the word embedding of word vi, which is a dense vector with |e| real value elements.
  • The information extractor, which is an ensemble model, is designed to extract sentiment information precisely from sentence information matrix X. The SIE consists of three subextractors.
  • In their case, the authors choose ReLU [31] as the nonlinear function.
  • When all of the latent semantic vectors mji are calculated separately, each sub-extractor will apply a max-pooling operation: mj = L max i=1 mji (6) The max function is an element-wise function.

4.1. Datasets

  • The Amazon5 dataset and the Amazon3 dataset contains 45, 000 training samples and 5, 000 testing samples in each class, and the samples are randomly selected from the origin data source.
  • The authors have crawled microblogs from Sina microblog website (http://weibo.com/) which has grown to be a major social media platform with hundreds of millions of users in China.
  • The authors cut off some records whose emotional tendencies are not obvious and there are 3, 000, 000 samples left.
  • The authors regard these three datasets as a benchmark to evaluate different models and explore the influence of parameters in the following experiments.

4.2. Pre-training and Word Embedding

  • There is no blank in a Chinese sentence which is different from English, so preprocessing work must be done at first to separate each sentence into several words which is called word segment and in their work the authors use an open source tool called JieBa[33] to conduct it.
  • After the word segment, the whole sentence is transformed into a sequence of Chinese words.
  • Initializing word vectors with those obtained from an unsupervised neural language model is a popular method to improve performance in the absence of a large supervised training set [34, 15, 35].
  • The authors use the publicly available word2vec tools that were trained on reviews from Amazon and SinaMicroblog for English and Chinese respectively.

4.3. Experiment Settings

  • The models are trained by min-batch back propagation with optimizer RMSprop [36] which is usually a good choice for LSTM.
  • The batch size chosen in the experiment is 128 and225 gradients are averaged over each batch.
  • Parameters of the model are randomly initialized over a uniform distribution with [-0.5, 0.5].
  • The authors set the number of kernels of convolution layers all as 200 with different window sizes and also set the number of hidden units in BLSTM as 200.
  • For regularization the authors use dropout [37] with probability 0.5 on the last Softmax layer within all models.

4.4. Results and Discussions

  • N their SICENN model, the structure of SIC is a fixed structure based on the BSLTM model.
  • The structure of SIE is more flexible.
  • Three critical factors that influence the effectiveness of SIE are explored in their following experiments.
  • The model ensemble strategy used to combine sub-extractors.

4.4.1. Size of information-extracting windows

  • In order to extract sentiment information from the sentence information matrix more240 precisely, the sizes of information-extracting windows need to be carefully chosen.
  • Resents views from amazon contains 3 categories and SinaMicroblog contain 2 categories.
  • RCNN refers to the model that Siwei proposed in [6].
  • Word embedding e(vi) is a pre-trained vector containing the semantic information of words, while sentence vectors cl(vi) and cr(vi) are the outputs of BLSTM containing the contextual information.
  • The experiments results show that265 the same window size have different performance in different datasets, which indicates the necessity to use ensemble strategy and combine the advantages of different window sizes.

4.4.2. Depth of sub-extractors

  • The depth of the sub-extractors is determined by the number of information-extracting layers, which can influence the accuracy for classification.
  • The authors have performed a series of experiments to explore how the depth of the sub-extractors influences the accuracy in the SIE.
  • Tion space than that with fewer layers, but more layers will also bring much difficulty to optimizer with backward propagation strategy.
  • The experiments results show that one layer just stands at a balance point.
  • The model ensemble strategy is essential285 for improve the performance of the information extractor and improve the accuracy for classification.

4.4.3. Model ensemble strategy

  • Model ensemble strategy can directly impact the effectiveness of the SIE and influence the results of sentiment classification.
  • Because the parameters in neural network are updated by iteration and search for the local optimal, so the initialization of these trainable parameters can influence the accuracy of sentiment classification.
  • By comparing Table 1 and Table 3, the authors can discovery that the SIE with model ensemble strategy outperforms the all the sub-extractor.
  • Besides, the SICENN model can reach a better accuracy if the authors initial the weights properly based on the results of Table 1 on different datasets.
  • The authors initial weights variables in Amazon5 as 0 ,1, 0because the extractor whose size of information extraction windows is 2 has the best performance among all the310 single window size, as is shown in Table 1.

4.5. Comparison of Methods

  • The authors compare their method with widely-used artificial neural network for sentiment anal-320 ysis including Siweis [6] model, which model has been compared with other state-of-the-art model.
  • The improvements in Amazon3 and SinaMicroblog are 0.72% and 0.73% respectively comparing their SICENN model with RCNN model.
  • The experiment results on various datasets also demonstrate their model outperforms previous state-of-the-art approaches.
  • The authors may build more sophisticated ensemble models and may involve more structures, such as attention model, to355 extract the sentiment information in the sentence more precisely.

Did you find this useful? Give us your feedback

Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents a novel model for experts to carry out Group Decision Making processes using free text and alternatives pairwise comparisons and introduces two ways of applying consensus measures over the Group decision Making process.
Abstract: Social networks are the most preferred mean for the people to communicate. Therefore, it is quite usual that experts use them to carry out Group Decision Making processes. One disadvantage that recent Group Decision Making methods have is that they do not allow the experts to use free text to express themselves. On the contrary, they force them to follow a specific user–computer communication structure. This is against social network nature where experts are free to express themselves using their preferred text structure. This paper presents a novel model for experts to carry out Group Decision Making processes using free text and alternatives pairwise comparisons. The main advantage of this method is that it is designed to work using social networks. Sentiment analysis procedures are used to analyze free texts and extract the preferences that the experts provide about the alternatives. Also, our method introduces two ways of applying consensus measures over the Group Decision Making process. They can be used to determine if the experts agree among them or if there are different postures. This way, it is possible to promote the debate in those cases where consensus is low.

89 citations

Journal ArticleDOI
TL;DR: This work evaluates existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach and suggests that simply translating the input text in a specific language to English and then using one of the existing best methods developed for English can be better than the existing language-specific approach evaluated.

72 citations


Cites methods from "A Sentiment Information Collector-E..."

  • ...[36] uses a Bidirectional LSTM to build a Sentiment Information Collector and a Sentiment Information Extractor (SIE)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, an end-to-end multi-prototype fusion embedding that fuses context-specific and task-specific information was proposed to solve the problem of polysemous-unaware word embedding.

33 citations

References
More filters
Proceedings Article
21 Jun 2010
TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Abstract: Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these "Stepped Sigmoid Units" are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors.

14,799 citations


"A Sentiment Information Collector-E..." refers methods in this paper

  • ...In our case, we choose ReLU [31] as the nonlinear function....

    [...]

  • ...In our case, we choose ReLU [31] as the nonlinear function. mji is a latent semantic vector, in which each semantic factor will be analyzed to determine the most useful factor for representing the text....

    [...]

01 Jan 2015

12,972 citations


"A Sentiment Information Collector-E..." refers methods in this paper

  • ...In order to overcome the weakness of LSTM, BLSTM is applied to sentiment analysis [24] by researchers and outperforms the traditional LSTM....

    [...]

Posted Content
TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

11,936 citations


"A Sentiment Information Collector-E..." refers background in this paper

  • ...Related Work Deep learning based neural network models have achieved great success in many NLP tasks in the past few years, including learning distributed word, sentence and document representation [11], parsing [19], statistical machine translation [20], sentence classification [16, 21], etc....

    [...]

Posted Content
Tomas Mikolov1, Ilya Sutskever1, Kai Chen1, Greg S. Corrado1, Jeffrey Dean1 
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

11,343 citations

Journal ArticleDOI
TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

11,201 citations


"A Sentiment Information Collector-E..." refers background in this paper

  • ...With the pre-trained word embeddings [9, 10, 11], neural networks demonstrate their great performance in sentiment analysis and many other NLP tasks....

    [...]