scispace - formally typeset
Search or ask a question
Author

Jin Wang

Other affiliations: Yunnan University
Bio: Jin Wang is an academic researcher from Yuan Ze University. The author has contributed to research in topics: Computer science & Sentiment analysis. The author has an hindex of 12, co-authored 24 publications receiving 893 citations. Previous affiliations of Jin Wang include Yunnan University.

Papers
More filters
Proceedings ArticleDOI
07 Aug 2016
TL;DR: A regional CNN-LSTM model consisting of two parts: regional CNN and LSTM to predict the VA ratings of texts is proposed, showing that the proposed method outperforms lexicon-based, regression- based, and NN-based methods proposed in previous studies.
Abstract: Dimensional sentiment analysis aims to recognize continuous numerical values in multiple dimensions such as the valencearousal (VA) space. Compared to the categorical approach that focuses on sentiment classification such as binary classification (i.e., positive and negative), the dimensional approach can provide more fine-grained sentiment analysis. This study proposes a regional CNN-LSTM model consisting of two parts: regional CNN and LSTM to predict the VA ratings of texts. Unlike a conventional CNN which considers a whole text as input, the proposed regional CNN uses an individual sentence as a region, dividing an input text into several regions such that the useful affective information in each region can be extracted and weighted according to their contribution to the VA prediction. Such regional information is sequentially integrated across regions using LSTM for VA prediction. By combining the regional CNN and LSTM, both local (regional) information within sentences and long-distance dependency across sentences can be considered in the prediction process. Experimental results show that the proposed method outperforms lexicon-based, regression-based, and NN-based methods proposed in previous studies.

428 citations

Proceedings ArticleDOI
01 Sep 2017
TL;DR: The proposed word vector refinement model is based on adjusting the vector representations of words such that they can be closer to both semantically and sentimentally similar words and further away from sentimentally dissimilar words.
Abstract: Word embeddings that can capture semantic and syntactic information from contexts have been extensively used for various natural language processing tasks. However, existing methods for learning context-based word embeddings typically fail to capture sufficient sentiment information. This may result in words with similar vector representations having an opposite sentiment polarity (e.g., good and bad), thus degrading sentiment analysis performance. Therefore, this study proposes a word vector refinement model that can be applied to any pre-trained word vectors (e.g., Word2vec and GloVe). The refinement model is based on adjusting the vector representations of words such that they can be closer to both semantically and sentimentally similar words and further away from sentimentally dissimilar words. Experimental results show that the proposed method can improve conventional word embeddings and outperform previously proposed sentiment embeddings for both binary and fine-grained classification on Stanford Sentiment Treebank (SST).

183 citations

Journal ArticleDOI
TL;DR: A word vector refinement model is proposed to refine existing pretrained word vectors using real-valued sentiment intensity scores provided by sentiment lexicons to improve each word vector such that it can be closer in the lexicon to both semantically and sentimentally similar words.
Abstract: Word embeddings that provide continuous low-dimensional vector representations of words have been extensively used for various natural language processing tasks. However, existing context-based word embeddings such as Word2vec and GloVe typically fail to capture sufficient sentiment information, which may result in words with similar vector representations having an opposite sentiment polarity (e.g., good and bad ), thus degrading sentiment analysis performance. To tackle this problem, recent studies have suggested learning sentiment embeddings to incorporate the sentiment polarity (positive and negative) information from labeled corpora. This study adopts another strategy to learn sentiment embeddings. Instead of creating a new word embedding from labeled corpora, we propose a word vector refinement model to refine existing pretrained word vectors using real-valued sentiment intensity scores provided by sentiment lexicons. The idea of the refinement model is to improve each word vector such that it can be closer in the lexicon to both semantically and sentimentally similar words (i.e., those with similar intensity scores) and further away from sentimentally dissimilar words (i.e., those with dissimilar intensity scores). An obvious advantage of the proposed method is that it can be applied to any pretrained word embeddings. In addition, the intensity scores can provide more fine-grained (real-valued) sentiment information than binary polarity labels to guide the refinement process. Experimental results show that the proposed refinement model can improve both conventional word embeddings and previously proposed sentiment embeddings for binary, ternary, and fine-grained sentiment classification on the SemEval and Stanford Sentiment Treebank datasets.

135 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: Experiments using CVAW words to predict the VA ratings of the CVAT corpus show results comparable to those obtained using English affective resources, and a corpus cleanup procedure is used to remove outlier ratings and improper texts.
Abstract: An increasing amount of research has recently focused on representing affective states as continuous numerical values on multiple dimensions, such as the valence-arousal (VA) space. Compared to the categorical approach that represents affective states as several classes (e.g., positive and negative), the dimensional approach can provide more finegrained sentiment analysis. However, affective resources with valence-arousal ratings are still very rare, especially for the Chinese language. Therefore, this study builds 1) an affective lexicon called Chinese valence-arousal words (CVAW) containing 1,653 words, and 2) an affective corpus called Chinese valencearousal text (CVAT) containing 2,009 sentences extracted from web texts. To improve the annotation quality, a corpus cleanup procedure is used to remove outlier ratings and improper texts. Experiments using CVAW words to predict the VA ratings of the CVAT corpus show results comparable to those obtained using English affective resources.

134 citations

Journal ArticleDOI
TL;DR: A tree-structured regional CNN-LSTM model consisting of two parts: regional CNN and LSTM to predict the VA ratings of texts is proposed, showing that the proposed method outperforms lexicon-, regression-, conventional NN and other structured NN methods proposed in previous studies.
Abstract: Dimensional sentiment analysis aims to recognize continuous numerical values in multiple dimensions such as the valence-arousal (VA) space. Compared to the categorical approach that focuses on sentiment classification such as binary classification (i.e., positive and negative), the dimensional approach can provide a more fine-grained sentiment analysis. This article proposes a tree-structured regional CNN-LSTM model consisting of two parts: regional CNN and LSTM to predict the VA ratings of texts. Unlike a conventional CNN which considers a whole text as input, the proposed regional CNN uses a part of the text as a region, dividing an input text into several regions such that the useful affective information in each region can be extracted and weighted according to their contribution to the VA prediction. Such regional information is sequentially integrated across regions using LSTM for VA prediction. By combining the regional CNN and LSTM, both local (regional) information within sentences and long-distance dependencies across sentences can be considered in the prediction process. To further improve performance, a region division strategy is proposed to discover task-relevant phrases and clauses to incorporate structured information into VA prediction. Experimental results on different corpora show that the proposed method outperforms lexicon-, regression-, conventional NN and other structured NN methods proposed in previous studies.

66 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper reviews significant deep learning related models and methods that have been employed for numerous NLP tasks and provides a walk-through of their evolution.
Abstract: Deep learning methods employ multiple processing layers to learn hierarchical representations of data, and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding of the past, present and future of deep learning in NLP.

2,466 citations

Proceedings ArticleDOI
01 Aug 2017
TL;DR: Crowdourcing on Amazon Mechanical Turk was used to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks, which included two subtasks: A, an expression-level subtask, and B, a message level subtask.
Abstract: This paper describes the fifth year of the Sentiment Analysis in Twitter task. SemEval-2017 Task 4 continues with a rerun of the subtasks of SemEval-2016 Task 4, which include identifying the overall sentiment of the tweet, sentiment towards a topic with classification on a two-point and on a five-point ordinal scale, and quantification of the distribution of sentiment towards a topic across a number of tweets: again on a two-point and on a five-point ordinal scale. Compared to 2016, we made two changes: (i) we introduced a new language, Arabic, for all subtasks, and (ii) we made available information from the profiles of the Twitter users who posted the target tweets. The task continues to be very popular, with a total of 48 teams participating this year.

1,107 citations

Journal ArticleDOI
TL;DR: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results as mentioned in this paper, which is also popularly used in sentiment analysis in recent years.
Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

917 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: The SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. as mentioned in this paper discusses the fourth year of the Sentiment Analysis in Twitter Task and discusses the three new subtasks focus on two variants of the basic sentiment classification in Twitter task.
Abstract: This paper discusses the fourth year of the ”Sentiment Analysis in Twitter Task”. SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. The first two subtasks are reruns from prior years and ask to predict the overall sentiment, and the sentiment towards a topic in a tweet. The three new subtasks focus on two variants of the basic “sentiment classification in Twitter” task. The first variant adopts a five-point scale, which confers an ordinal character to the classification task. The second variant focuses on the correct estimation of the prevalence of each class of interest, a task which has been called quantification in the supervised learning literature. The task continues to be very popular, attracting a total of 43 teams.

702 citations

Journal ArticleDOI
01 Sep 2019-Energy
TL;DR: This paper proposes a CNN-LSTM neural network that can extract spatial and temporal features to effectively predict the housing energy consumption and achieves almost perfect prediction performance for electric energy consumption that was previously difficult to predict.

677 citations