scispace - formally typeset
Search or ask a question
Author

Zhao Jianqiang

Bio: Zhao Jianqiang is an academic researcher from Xi'an Jiaotong University. The author has contributed to research in topics: Sentiment analysis & Stop words. The author has an hindex of 5, co-authored 5 publications receiving 456 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A word embeddings method obtained by unsupervised learning based on large twitter corpora is introduced, this method using latent contextual semantic relationships and co-occurrence statistical characteristics between words in tweets to form a sentiment feature set of tweets.
Abstract: Twitter sentiment analysis technology provides the methods to survey public emotion about the events or products related to them. Most of the current researches are focusing on obtaining sentiment features by analyzing lexical and syntactic features. These features are expressed explicitly through sentiment words, emoticons, exclamation marks, and so on. In this paper, we introduce a word embeddings method obtained by unsupervised learning based on large twitter corpora, this method using latent contextual semantic relationships and co-occurrence statistical characteristics between words in tweets. These word embeddings are combined with n-grams features and word sentiment polarity score features to form a sentiment feature set of tweets. The feature set is integrated into a deep convolution neural network for training and predicting sentiment classification labels. We experimentally compare the performance of our model with the baseline model that is a word n-grams model on five Twitter data sets, the results indicate that our model performs better on the accuracy and F1-measure for twitter sentiment classification.

342 citations

Journal ArticleDOI
TL;DR: The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words.
Abstract: Twitter sentiment analysis offers organizations ability to monitor public feeling towards the products and events related to them in real time. The first step of the sentiment analysis is the text pre-processing of Twitter data. Most existing researches about Twitter sentiment analysis are focused on the extraction of new sentiment features. However, to select the pre-processing method is ignored. This paper discussed the effects of text pre-processing method on sentiment classification performance in two types of classification tasks, and summed up the classification performances of six pre-processing methods using two feature models and four classifiers on five Twitter datasets. The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words. The Naive Bayes and Random Forest classifiers are more sensitive than Logistic Regression and support vector machine classifiers when various pre-processing methods were applied.

252 citations

Journal ArticleDOI
TL;DR: A user influence rank (UIRank) algorithm is proposed to identify the influential users through interaction information flow and interaction relationships among users in the micro-blog.
Abstract: Micro-blog services have become popular tools in the social networks. Online users discuss various topics in the micro-blog and some influential users can affect the opinions, attitudes, behaviors, or emotions of others. This paper proposes a user influence rank (UIRank) algorithm to identify the influential users through interaction information flow and interaction relationships among users in the micro-blog. The UIRank algorithm considers the contribution of user’s tweet and the characteristics of information dissemination in the micro-blog networks and calculates user influence score iteratively by user follower graph. Experimental results show that the UIRank algorithm outperforms other existing related algorithms in the precision, recall, and F1-Measure value.

51 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: The experiments show that the accuracy of sentiment classification rises after expanding acronym and replacing negation, although hardly change when removal URL, removal numbers and removal stopword are applied.
Abstract: Twitter sentiment analysis offers organizations an ability to monitor public feeling towards the products and events related to them in real time Most existing researches to identify Twitter sentiment are focused on the extraction of new sentiment features and apply pre-processing before features selection, although ignore the role of tweet pre-processing In this paper, we discuss the effects of pre-processing on sentiment classification performance We evaluated the effects of URL, stopword, repeated letters, negation, acronym and number on sentiment classification performance using two feature models and four classifiers on five Twitter datasets The experiments show that the accuracy of sentiment classification rises after expanding acronym and replacing negation, although hardly change when removal URL, removal numbers and removal stopword are applied The various pre-processing methods cause different influence on performance of classifiers for each dataset

50 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: Distributed representation of sentence that can capture co-occurrence statistics and contextual semantic relations of words in tweets, and represent a tweet via a fixed size feature vector is introduced.
Abstract: Twitter sentiment analysis offers organizations an ability to monitor public feeling towards the products and events related to them in real time. Most existing researches for Twitter sentiment analysis are focused on the extraction of sentiment feature of lexical and syntactic feature that are expressed explicitly through words, emoticons, exclamation marks etc, although sentiment implicitly expressed via latent contextual semantic relations, dependencies among words in tweets are ignored. In this paper, we introduce distributed representation of sentence that can capture co-occurrence statistics and contextual semantic relations of words in tweets, and represent a tweet via a fixed size feature vector. We used the feature vector as sentence semantic feature for the tweet. We combined semantic feature, prior polarity score feature and n-grams feature as sentiment feature set of tweets, and incorporated the feature set into Support Vector Machines(SVM) model training and predicting sentiment classification label. We used six Twitter datasets in our evaluation and compared the performance against n-grams model baseline. Results show the superior performance of our method in accuracy sentiment classification.

13 citations


Cited by
More filters
Book
01 Jan 1975
TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.
Abstract: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval. This chapter has been included because I think this is one of the most interesting and active areas of research in information retrieval. There are still many problems to be solved so I hope that this particular chapter will be of some help to those who want to advance the state of knowledge in this area. All the other chapters have been updated by including some of the more recent work on the topics covered. In preparing this new edition I have benefited from discussions with Bruce Croft, The material of this book is aimed at advanced undergraduate information (or computer) science students, postgraduate library science students, and research workers in the field of IR. Some of the chapters, particularly Chapter 6 * , make simple use of a little advanced mathematics. However, the necessary mathematical tools can be easily mastered from numerous mathematical texts that now exist and, in any case, references have been given where the mathematics occur. I had to face the problem of balancing clarity of exposition with density of references. I was tempted to give large numbers of references but was afraid they would have destroyed the continuity of the text. I have tried to steer a middle course and not compete with the Annual Review of Information Science and Technology. Normally one is encouraged to cite only works that have been published in some readily accessible form, such as a book or periodical. Unfortunately, much of the interesting work in IR is contained in technical reports and Ph.D. theses. For example, most the work done on the SMART system at Cornell is available only in reports. Luckily many of these are now available through the National Technical Information Service (U.S.) and University Microfilms (U.K.). I have not avoided using these sources although if the same material is accessible more readily in some other form I have given it preference. I should like to acknowledge my considerable debt to many people and institutions that have helped me. Let me say first that they are responsible for many of the ideas in this book but that only I wish to be held responsible. My greatest debt is to Karen Sparck Jones who taught me to research information retrieval as an experimental science. Nick Jardine and Robin …

822 citations

Journal ArticleDOI
TL;DR: A word embeddings method obtained by unsupervised learning based on large twitter corpora is introduced, this method using latent contextual semantic relationships and co-occurrence statistical characteristics between words in tweets to form a sentiment feature set of tweets.
Abstract: Twitter sentiment analysis technology provides the methods to survey public emotion about the events or products related to them. Most of the current researches are focusing on obtaining sentiment features by analyzing lexical and syntactic features. These features are expressed explicitly through sentiment words, emoticons, exclamation marks, and so on. In this paper, we introduce a word embeddings method obtained by unsupervised learning based on large twitter corpora, this method using latent contextual semantic relationships and co-occurrence statistical characteristics between words in tweets. These word embeddings are combined with n-grams features and word sentiment polarity score features to form a sentiment feature set of tweets. The feature set is integrated into a deep convolution neural network for training and predicting sentiment classification labels. We experimentally compare the performance of our model with the baseline model that is a word n-grams model on five Twitter data sets, the results indicate that our model performs better on the accuracy and F1-measure for twitter sentiment classification.

342 citations

Journal ArticleDOI
TL;DR: An improved word representation method is proposed, which integrates the contribution of sentiment information into the traditional TF-IDF algorithm and generates weighted word vectors, which is proved to be effective with high accuracy on comments.
Abstract: With the rapid development of Internet technology and social networks, a large number of comment texts are generated on the Web. In the era of big data, mining the emotional tendency of comments through artificial intelligence technology is helpful for the timely understanding of network public opinion. The technology of sentiment analysis is a part of artificial intelligence, and its research is very meaningful for obtaining the sentiment trend of the comments. The essence of sentiment analysis is the text classification task, and different words have different contributions to classification. In the current sentiment analysis studies, distributed word representation is mostly used. However, distributed word representation only considers the semantic information of word, but ignore the sentiment information of the word. In this paper, an improved word representation method is proposed, which integrates the contribution of sentiment information into the traditional TF-IDF algorithm and generates weighted word vectors. The weighted word vectors are input into bidirectional long short term memory (BiLSTM) to capture the context information effectively, and the comment vectors are better represented. The sentiment tendency of the comment is obtained by feedforward neural network classifier. Under the same conditions, the proposed sentiment analysis method is compared with the sentiment analysis methods of RNN, CNN, LSTM, and NB. The experimental results show that the proposed sentiment analysis method has higher precision, recall, and F1 score. The method is proved to be effective with high accuracy on comments.

338 citations

Journal ArticleDOI
TL;DR: The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words.
Abstract: Twitter sentiment analysis offers organizations ability to monitor public feeling towards the products and events related to them in real time. The first step of the sentiment analysis is the text pre-processing of Twitter data. Most existing researches about Twitter sentiment analysis are focused on the extraction of new sentiment features. However, to select the pre-processing method is ignored. This paper discussed the effects of text pre-processing method on sentiment classification performance in two types of classification tasks, and summed up the classification performances of six pre-processing methods using two feature models and four classifiers on five Twitter datasets. The experiments show that the accuracy and F1-measure of Twitter sentiment classification classifier are improved when using the pre-processing methods of expanding acronyms and replacing negation, but barely changes when removing URLs, removing numbers or stop words. The Naive Bayes and Random Forest classifiers are more sensitive than Logistic Regression and support vector machine classifiers when various pre-processing methods were applied.

252 citations

Journal ArticleDOI
TL;DR: A new sentiment analysis model-SLCABG, which is based on the sentiment lexicon and combines Convolutional Neural Network (CNN) and attention-based Bidirectional Gated Recurrent Unit (BiGRU).
Abstract: In recent years, with the rapid development of Internet technology, online shopping has become a mainstream way for users to purchase and consume. Sentiment analysis of a large number of user reviews on e-commerce platforms can effectively improve user satisfaction. This paper proposes a new sentiment analysis model-SLCABG, which is based on the sentiment lexicon and combines Convolutional Neural Network (CNN) and attention-based Bidirectional Gated Recurrent Unit (BiGRU). In terms of methods, the SLCABG model combines the advantages of sentiment lexicon and deep learning technology, and overcomes the shortcomings of existing sentiment analysis model of product reviews. The SLCABG model combines the advantages of the sentiment lexicon and deep learning techniques. First, the sentiment lexicon is used to enhance the sentiment features in the reviews. Then the CNN and the Gated Recurrent Unit (GRU) network are used to extract the main sentiment features and context features in the reviews and use the attention mechanism to weight. And finally classify the weighted sentiment features. In terms of data, this paper crawls and cleans the real book evaluation of dangdang.com, a famous Chinese e-commerce website, for training and testing, all of which are based on Chinese. The scale of the data has reached 100000 orders of magnitude, which can be widely used in the field of Chinese sentiment analysis. The experimental results show that the model can effectively improve the performance of text sentiment analysis.

242 citations