scispace - formally typeset
SciSpace - Your AI assistant to discover and understand research papers | Product Hunt

Journal ArticleDOI

A Sentiment Information Collector-Extractor Architecture Based Neural Network for Sentiment Analysis

01 Oct 2018-Information Sciences (Elsevier)-Vol. 467, pp 549-558

TL;DR: A new ensemble strategy is applied to combine the results of different sub-extractors, making the SIE more universal and outperform any single sub- Extractor and outperforms the state-of-the-art methods on three datasets of different language.

AbstractSentiment analysis, also known as opinion mining is a key natural language processing (NLP) task that receives much attention these years, where deep learning based neural network models have achieved great success. However, the existing deep learning models cannot effectively make use of the sentiment information in the sentence for sentiment analysis. In this paper, we propose a Sentiment Information Collector–Extractor architecture based Neural Network (SICENN) for sentiment analysis consisting of a Sentiment Information Collector (SIC) and a Sentiment Information Extractor (SIE). The SIC based on the Bi-directional Long Short Term Memory structure aims at collecting the sentiment information in the sentence and generating the information matrix. The SIE takes the information matrix as input and extracts the sentiment information precisely via three different sub-extractors. A new ensemble strategy is applied to combine the results of different sub-extractors, making the SIE more universal and outperform any single sub-extractor. Experiments results show that the proposed architecture outperforms the state-of-the-art methods on three datasets of different language.

Topics: Sentiment analysis (71%), Deep learning (53%)

Summary (3 min read)

1. Introduction

  • Deep learning has made a great progress recently and plays an important role in academia and industry.
  • This key word appears in two completely different positions.
  • Besides, sentence (iii) contains two key words not and pleasant and they are separated by another word been.
  • How to locate the key words remains a big challenge in sentiment analysis.30 Researchers have designed many efficient models in order to capture the sentiment information.
  • Thus, it could reduce the effectiveness40 when RNN is used to capture the semantics of a whole sentence, because key components could appear anywhere in a sentence rather than at the end.

3. Model

  • Figure 1 shows the architecture of the whole model.
  • As is illustrated in Figure 1, the model can be divided into two part: (i) SIC and (ii) SIE.
  • Then the matrix X is fed into information extractor and latent semantic information will be extracted based on model ensemble strategy.

3.1. Sentiment Information Collector (SIC)

  • The authors first describe the architecture of the SIC in their model.
  • The left-side context cl(vi) of word vi is calculated using Equation(1), where e(vi) is the word embedding of word vi, which is a dense vector with |e| real value elements.
  • The information extractor, which is an ensemble model, is designed to extract sentiment information precisely from sentence information matrix X. The SIE consists of three subextractors.
  • In their case, the authors choose ReLU [31] as the nonlinear function.
  • When all of the latent semantic vectors mji are calculated separately, each sub-extractor will apply a max-pooling operation: mj = L max i=1 mji (6) The max function is an element-wise function.

4.1. Datasets

  • The Amazon5 dataset and the Amazon3 dataset contains 45, 000 training samples and 5, 000 testing samples in each class, and the samples are randomly selected from the origin data source.
  • The authors have crawled microblogs from Sina microblog website (http://weibo.com/) which has grown to be a major social media platform with hundreds of millions of users in China.
  • The authors cut off some records whose emotional tendencies are not obvious and there are 3, 000, 000 samples left.
  • The authors regard these three datasets as a benchmark to evaluate different models and explore the influence of parameters in the following experiments.

4.2. Pre-training and Word Embedding

  • There is no blank in a Chinese sentence which is different from English, so preprocessing work must be done at first to separate each sentence into several words which is called word segment and in their work the authors use an open source tool called JieBa[33] to conduct it.
  • After the word segment, the whole sentence is transformed into a sequence of Chinese words.
  • Initializing word vectors with those obtained from an unsupervised neural language model is a popular method to improve performance in the absence of a large supervised training set [34, 15, 35].
  • The authors use the publicly available word2vec tools that were trained on reviews from Amazon and SinaMicroblog for English and Chinese respectively.

4.3. Experiment Settings

  • The models are trained by min-batch back propagation with optimizer RMSprop [36] which is usually a good choice for LSTM.
  • The batch size chosen in the experiment is 128 and225 gradients are averaged over each batch.
  • Parameters of the model are randomly initialized over a uniform distribution with [-0.5, 0.5].
  • The authors set the number of kernels of convolution layers all as 200 with different window sizes and also set the number of hidden units in BLSTM as 200.
  • For regularization the authors use dropout [37] with probability 0.5 on the last Softmax layer within all models.

4.4. Results and Discussions

  • N their SICENN model, the structure of SIC is a fixed structure based on the BSLTM model.
  • The structure of SIE is more flexible.
  • Three critical factors that influence the effectiveness of SIE are explored in their following experiments.
  • The model ensemble strategy used to combine sub-extractors.

4.4.1. Size of information-extracting windows

  • In order to extract sentiment information from the sentence information matrix more240 precisely, the sizes of information-extracting windows need to be carefully chosen.
  • Resents views from amazon contains 3 categories and SinaMicroblog contain 2 categories.
  • RCNN refers to the model that Siwei proposed in [6].
  • Word embedding e(vi) is a pre-trained vector containing the semantic information of words, while sentence vectors cl(vi) and cr(vi) are the outputs of BLSTM containing the contextual information.
  • The experiments results show that265 the same window size have different performance in different datasets, which indicates the necessity to use ensemble strategy and combine the advantages of different window sizes.

4.4.2. Depth of sub-extractors

  • The depth of the sub-extractors is determined by the number of information-extracting layers, which can influence the accuracy for classification.
  • The authors have performed a series of experiments to explore how the depth of the sub-extractors influences the accuracy in the SIE.
  • Tion space than that with fewer layers, but more layers will also bring much difficulty to optimizer with backward propagation strategy.
  • The experiments results show that one layer just stands at a balance point.
  • The model ensemble strategy is essential285 for improve the performance of the information extractor and improve the accuracy for classification.

4.4.3. Model ensemble strategy

  • Model ensemble strategy can directly impact the effectiveness of the SIE and influence the results of sentiment classification.
  • Because the parameters in neural network are updated by iteration and search for the local optimal, so the initialization of these trainable parameters can influence the accuracy of sentiment classification.
  • By comparing Table 1 and Table 3, the authors can discovery that the SIE with model ensemble strategy outperforms the all the sub-extractor.
  • Besides, the SICENN model can reach a better accuracy if the authors initial the weights properly based on the results of Table 1 on different datasets.
  • The authors initial weights variables in Amazon5 as 0 ,1, 0because the extractor whose size of information extraction windows is 2 has the best performance among all the310 single window size, as is shown in Table 1.

4.5. Comparison of Methods

  • The authors compare their method with widely-used artificial neural network for sentiment anal-320 ysis including Siweis [6] model, which model has been compared with other state-of-the-art model.
  • The improvements in Amazon3 and SinaMicroblog are 0.72% and 0.73% respectively comparing their SICENN model with RCNN model.
  • The experiment results on various datasets also demonstrate their model outperforms previous state-of-the-art approaches.
  • The authors may build more sophisticated ensemble models and may involve more structures, such as attention model, to355 extract the sentiment information in the sentence more precisely.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

A Sentiment Information Collector-Extractor Architecture
Based Neural Network for Sentiment Analysis
Kai Shuang
a
, Hao Guo
a,
, Zhixuan Zhang
a
, Jonathan Loo
b
a
State Key Laboratory of Networking & Switching Technology, Beijing University of Posts and
Telecommunications, 100876, Beijing, P.R.China
b
School of Computing and Engineering, University of West London, W5 5RF, UK
Abstract
Sentiment analysis, also known as opinion mining is a key natural language processing
(NLP) task that receives much attention these years. Deep learning based neural network
models have achieved great success in it. However, the existing deep learning models cannot
effectively make use of the sentiment information in the sentence for sentiment analysis. In
our model, we apply a bi-directional Long Short Term Memory structure based sentiment
information collector to collect the sentiment information in the sentence, which may collect
information more completely compared with other types of neural network. Then we also
apply an ensemble model of sentiment information extractor to combine the results of
these sub-extractors and the new ensemble strategy makes our model more universal and
outperforms any single sub-extractor. We conduct experiments on three datasets of different
languages. The experimental results show that the proposed method outperforms the state-
of-the-art methods on all datasets.
Keywords: sentiment analysis, sentiment information collector, sentiment information
extractor, model ensemble
1. Introduction
Deep learning has made a great progress recently and plays an important role in
academia and industry. In particularly, standard natural language processing (NLP) ap-
proaches for entity and relationship extraction are improved [1] and business-aware concept
Corresponding author
Email addresses: shuangk@bupt.edu.cn (Kai Shuang), guo_yu_hao_1993@163.com (Hao Guo),
2228290335@qq.com (Zhixuan Zhang), jonathan.loo@uwl.ac.uk (Jonathan Loo)
Preprint submitted to Journal of L
A
T
E
X Templates August 24, 2017

detection by convolutional neural networks is proposed [2]. Based on deep neural network,5
new inspirations are brought to various NLP task. Recent progress in word representa-
tion provides good resources for lexical semantics [3]. Text classification is an essential
component in many applications, such as sentiment analysis [4, 5] web searching and in-
formation filtering [6]. Therefore, it has attracted considerable attention in both academia
and industry.10
Sentiment analysis [7], also known as opinion mining [5], is a key NLP task that re-
ceives much attention these years. It refers to the process of computationally identifying
and categorizing opinions expressed in a piece of text, in order to determine whether the
writers attitude towards a particular topic or product is positive, negative, or even neutral.
However, traditional feature representation methods for sentiment analysis often ignore the15
contextual word order information in texts or have the data sparsity problem which heavily
affects the classification accuracy [8]. With the pre-trained word embeddings [9, 10, 11],
neural networks demonstrate their great performance in sentiment analysis and many other
NLP tasks.
In particularly, when classifying the sentiment polarity of a long sentence, the most20
essential work is to locate the key words which can indicate the sentiment polarity of the
whole sentence. For examples, consider these three sentences (i) Happiness has stayed with
me since I found out my own. (ii) I spent a whole day in the park, which far away from my
house, in happiness. (iii) To be honest, I have not been pleasant since I was informed the
terrible news. Both of sentence (i) and sentence (ii) contain the key word happiness which25
indicates positive emotion. However, this key word appears in two completely different
positions. Besides, sentence (iii) contains two key words not and pleasant and they are
separated by another word been. These two words together can indicate the sentiment
polarity of the sentence. How to locate the key words remains a big challenge in sentiment
analysis.30
Researchers have designed many efficient models in order to capture the sentiment
information. For example, Recurrent Neural Network (RNN) which includes Long Short
Term Memory (LSTM), Gated Recurrent Unit(GRU) and so on is one of the most popular
models. Standard RNN has the gradient vanishing or exploding problems. In order to
overcome the issues, LSTM was developed and achieved superior performance [12]. The35
2

model analyzes a text word by word in the order of they appear in the text and stores
the semantics of all the previous text in a fixed-sized hidden layer [13] The advantage of
RNN is the ability to better capture the contextual information. This could be beneficial
to capture semantics of long texts. However, the RNN is a biased model, where later few
words are more dominant than the earlier words [6]. Thus, it could reduce the effectiveness40
when RNN is used to capture the semantics of a whole sentence, because key components
could appear anywhere in a sentence rather than at the end. For examples, in sentence (i)
the key word happiness appears in the front of the sentence and the same key word appears
in the back of the sentence (ii). The two key words not and pleasant appear in the middle
of sentence (iii). When these three types of sentence are fed into RNN in the order of words45
appear in the sentence, the sentence (ii) will have the best performance comparing with
the others.
Besides, Convolutional Neural Network (CNN) which is an unbiased model can fairly
determine discriminative phrases in a text with a max-pooling layer. However, CNN net-
work itself has a characteristic of local connection [14]. Previous studies on CNNs tends to50
apply CNN to analyze the local contextual information of a sentence [15, 16]. For exam-
ple, some researchers take the results of word embedding as CNN input, each convolution
window contains information of a few words in the sentence which means the outputs of
the convolution layer are based on local information in the sentence. Although followed
max-pooling layer can help extract information, but this result is mainly based on the local55
information output from the convolution layer. In this way, when using CNN to deal with
long sentences, it is difficult to analyze the contextual information of the entire sentence. In
order to cope with the existing problems and capture the key words that indicate the sen-
timent polarity, we propose a sentiment information collector-extractor architecture based
neural network (SICENN) for text classification First, the bidirectional long short term60
memory (BLSTM) structure [17, 18] is applied as a Sentiment Information Collector (SIC)
to generate sentence information matrix which contain all the contextual information of the
sentence. Second, sentiment key words will be automatically extracted from sentence infor-
mation matrix and the emotional polarity will be extracted by our Sentiment Information
Extractor (SIE).65
BLSTM has the ability to better capture the contextual information. BLSTM is an
3

unbiased model, because the output of BLSTM is a sentence vector at each time-step.
Each sentence vector emphasizes the information around it. In other words, the output
of the BLSTM at each time-step is a sentence vector which contains one particular aspect
of information of the sentence that can also be regarded as a particular feature of the70
sentence. The SIE in our model stacks the vectors generated at each time step into a
sentence information matrix, which contains all the features of the sentence, and feeds it
into the SIE. The SIE aims at extracting the contextual information related to sentiment
polarity from the sentence information matrix. Three sub-extractors are applied to extract
the sentiment information respectively and model ensemble approach is used to combine and75
process the outputs of the three sub-extractors. Based on model ensemble theoryexperiment
results show that, our ensemble SIE will outperform any sub-extractor.
To summarize, our contributions are as follows:
Based on the characteristics of BLSTM structure, SIC is designed, which can collect
the sentiment information in the sentence completely.80
Based on the model ensemble strategy, SIE is designed, which can extract the senti-
ment information precisely from the outputs of the SIC.
Experiments are set up to validate the accuracy of our SICENN model, and the results
show that our model outperforms previous state-of-the-art approaches and can better
capture the sentiment information in the sentence.85
2. Related Work
Deep learning based neural network models have achieved great success in many NLP
tasks in the past few years, including learning distributed word, sentence and document
representation [11], parsing [19], statistical machine translation [20], sentence classification
[16, 21], etc. Learning distributed sentence representation through neural network models90
can reach satisfactory results in related tasks like sentiment classification, text categoriza-
tion. Among the neural network models, CNN and RNN are two most popular models
and the variants of these models are applied in sentiment analysis recently. For CNN, a
multichannel CNN model [16] is proposed to increase the accuracy for sentence classifi-
cation, but each convolution window contains information of a few words in the sentence95
4

which means the outputs of the convolution layer are only based on local information in
the sentence. For RNN, gated neural networks [22] is proposed to capture the influence
of the surrounding words when performing sentiment classification of entities. LSTM is
developed [12] and achieved more superior performance then both tradition RNN structure
and GRU [23]. But LSTM is still a biased mo del, where later few words are more dominant100
than the earlier words [6 ]. In order to overcome the weakness of LSTM, BLSTM is applied
to sentiment analysis [24] by researchers and outperforms the traditional LSTM.
CNN and RNN models can be applied to sentiment analysis task individually and they
can also be combined properly to improve the performance on classification. Although
there are many previous models [6, 25, 26] combining CNN & RNN, they may not make105
best use of the ability for CNN & RNN to collection and extract sentiment information
base on the characteristics of CNN and RNN. For example, the Recurrent Convolutional
Neural Network (RCNN) model in [6] didn’t make best use of RNN and CNN. The bi-
directional recurrent structure used in RCNN model is similar to BLSTM structure but
concatenates word embedding vector with sentence vector, which may make the accuracy110
for sentiment classification decline. There is only a linear transformation together with
the tanh activation function after bi-directional recurrent structure which cannot extract
the sentence information effectively. The weakness of RCNN model will be explained more
thoroughly in Section 4.4.1.
The idea of using neural networks in an ensemble has been proposed previously in115
[27, 28, 29]. An ensemble of residual nets is applied to image recognition [30]. The ensemble
model can combine the results of different individual sub-models, which makes the whole
model learn the characteristic of the datasets better and outperform all the sub-models. In
these paper, the SICENN is proposed, by make full use of CNN and BLSTM using in an
ensemble. Based on our proper ensemble strategy, accuracy on sentiment classification is120
improved further.
3. Model
In this section, we will introduce our model in details. Figure 1 shows the architecture
of the whole model. As is illustrated in Figure 1, the model can be divided into two part:
(i) SIC and (ii) SIE.125
5

Citations
More filters



Journal ArticleDOI
TL;DR: This paper presents a novel model for experts to carry out Group Decision Making processes using free text and alternatives pairwise comparisons and introduces two ways of applying consensus measures over the Group decision Making process.
Abstract: Social networks are the most preferred mean for the people to communicate. Therefore, it is quite usual that experts use them to carry out Group Decision Making processes. One disadvantage that recent Group Decision Making methods have is that they do not allow the experts to use free text to express themselves. On the contrary, they force them to follow a specific user–computer communication structure. This is against social network nature where experts are free to express themselves using their preferred text structure. This paper presents a novel model for experts to carry out Group Decision Making processes using free text and alternatives pairwise comparisons. The main advantage of this method is that it is designed to work using social networks. Sentiment analysis procedures are used to analyze free texts and extract the preferences that the experts provide about the alternatives. Also, our method introduces two ways of applying consensus measures over the Group Decision Making process. They can be used to determine if the experts agree among them or if there are different postures. This way, it is possible to promote the debate in those cases where consensus is low.

66 citations


Journal ArticleDOI
TL;DR: This work evaluates existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach and suggests that simply translating the input text in a specific language to English and then using one of the existing best methods developed for English can be better than the existing language-specific approach evaluated.
Abstract: Sentiment analysis has become a key tool for several social media applications, including, analysis of user’s opinions about products and services, support for politics during campaigns and even identification of market trending. Multiple existing sentiment analysis methods explore different techniques, usually relying on lexical resources or learning approaches. Despite the significant interest in this theme and amount of research efforts in the field, almost all existing methods are designed to work with only English content. Most current strategies in other languages consist of adapting existing lexical resources, without presenting proper validations and basic baseline comparisons. In this work, we take a different step into this field. We focus on evaluating existing efforts proposed to do language specific sentiment analysis with a simple yet effective baseline approach. To do it, we evaluated sixteen methods for sentence-level sentiment analysis proposed for English, and compared them with three language-specific methods. Based on fourteen human labeled language-specific datasets, we provide an extensive quantitative analysis of existing multilingual approaches. Our results suggest that simply translating the input text in a specific language to English and then using one of the existing best methods developed for English can be better than the existing language-specific approach evaluated. We also rank methods according to their prediction performance and identify those that acquired the best results using machine translation across different languages. As a final contribution to the research community, we release our codes, datasets, and the iFeel 3.0 system, a Web framework and tool for multilingual sentence-level sentiment analysis 1 . We hope our system sets up a new baseline for future sentence-level methods developed in a wide set of languages.

33 citations


Cites methods from "A Sentiment Information Collector-E..."

  • ...[36] uses a Bidirectional LSTM to build a Sentiment Information Collector and a Sentiment Information Extractor (SIE)....

    [...]


Journal ArticleDOI
Abstract: Existing unsupervised word embedding methods have been proved to be effective to capture latent semantic information on various tasks of Natural Language Processing (NLP). However, existing word representation methods are incapable of tackling both the polysemous-unaware and task-unaware problems that are common phenomena in NLP tasks. In this work, we present a novel Convolution–Deconvolution Word Embedding (CDWE), an end-to-end multi-prototype fusion embedding that fuses context-specific information and task-specific information. To the best of our knowledge, we are the first to extend deconvolution (e.g. convolution transpose), which has been widely used in computer vision, to word embedding generation. We empirically demonstrate the efficiency and generalization ability of CDWE by applying it to two representative tasks in NLP: text classification and machine translation. The models of CDWE significantly outperform the baselines and achieve state-of-the-art results on both tasks. To validate the efficiency of CDWE further, we demonstrate how CDWE solves the polysemous-unaware and task-unaware problems via analyzing the Text Deconvolution Saliency, which is an existing strategy for evaluating the outputs of deconvolution.

18 citations


References
More filters

Proceedings ArticleDOI
27 Jun 2016
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

93,356 citations


Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

49,735 citations


"A Sentiment Information Collector-E..." refers background or methods in this paper

  • ...LSTM is developed [12] and achieved more superior performance then both tradition RNN structure and GRU [23]....

    [...]

  • ...In order to overcome the issues, LSTM was developed and achieved superior performance [12]....

    [...]


Journal ArticleDOI
01 Jan 1998
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

34,930 citations


Journal Article

28,684 citations


"A Sentiment Information Collector-E..." refers background in this paper

  • ...For RNN, gated neural networks [22] is proposed to capture the influence of the surrounding words when performing sentiment classification of entities....

    [...]


Proceedings Article
Tomas Mikolov1, Ilya Sutskever1, Kai Chen1, Greg S. Corrado1, Jeffrey Dean1 
05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

23,982 citations


"A Sentiment Information Collector-E..." refers background or methods in this paper

  • ...dimensionality of 300 and were trained using the continuous skip-gram architecture [11]....

    [...]

  • ...For word-embedding method, we initialize word vectors with those obtained from an unsupervised neural language model [11]....

    [...]

  • ...With the pre-trained word embeddings [9, 10, 11], neural networks demonstrate their great performance in sentiment analysis and many other NLP tasks....

    [...]

  • ...Related Work Deep learning based neural network models have achieved great success in many NLP tasks in the past few years, including learning distributed word, sentence and document representation [11], parsing [19], statistical machine translation [20], sentence classification [16, 21], etc....

    [...]