scispace - formally typeset
Search or ask a question
Book ChapterDOI

Vertical and Sequential Sentiment Analysis of Micro-blog Topic

16 Nov 2018-pp 353-363
TL;DR: Distant supervised learning method based on micro-blog expressions and sentiment lexicon is proposed and fastText is used to train word vectors and classification model, which shows that the accuracy of the classifier reaches 92.2%, and the sequential sentiment analysis based on this classifier can accurately reflect the emotional trend of micro- blog topics.
Abstract: Sentiment analysis of micro-blog topic aims to explore people’s attitudes towards a topic or event on social networks. Most existing research analyzed the micro-blog sentiment by traditional algorithms such as Naive Bayes and SVM based on the manually labelled data. They do not consider timeliness of data and inwardness of the topics. Meanwhile, few Chinese micro-blog sentiment analysis based on large-scale corpus is investigated. This paper focuses on the analysis of sequential sentiment based on a million-level Chinese micro-blog corpora to mine the features of sequential sentiment precisely. Distant supervised learning method based on micro-blog expressions and sentiment lexicon is proposed and fastText is used to train word vectors and classification model. The timeliness of analysis is guaranteed on the premise of ensuring the accuracy of classifier. The experiment shows that the accuracy of the classifier reaches 92.2%, and the sequential sentiment analysis based on this classifier can accurately reflect the emotional trend of micro-blog topics.
Citations
More filters
Journal ArticleDOI
TL;DR: Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple media formats, like text, image, video and audio.
Abstract: Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 published studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, and other aspects derived. Social Opinion Mining can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. The latest developments in Social Opinion Mining beyond 2018 are also presented together with future research directions, with the aim of leaving a wider academic and societal impact in several real-world applications.

19 citations

Journal ArticleDOI
TL;DR: The combination of Conjunction Analysis (CA) technology and Punctuation Mark Identification (PMI) technology is used to detect negation cue and its scope and the OL-DAWE model, which uses Data Augmentation (DA) technology to generate opposed tweets according to the original tweet is proposed.
Abstract: Introducing negative items into sentences can shift the polarity of emotional words and leads to misclassification. Therefore, dealing with the negative item is indispensable to the analysis of the polarity of tweets. This paper first uses the combination of Conjunction Analysis (CA) technology and Punctuation Mark Identification (PMI) technology to detect negation cue and its scope. Besides, we propose the OL-DAWE model, which uses Data Augmentation(DA) technology to generate opposed tweets according to the original tweet. The model extends the training data set, and test data set and learns the original and opposed sides of the tweet in the training module. When predicting the polarity of tweets, the OL-DAWE model considers the positive degree (negative degree) of the original tweet and the negative degree (positive degree) of its opposed tweet. We conduct experiments on two real-world data sets. We prove the effectiveness of our combined technology in negation processing and show that the OL-DAWE model in the polarity sentiment analysis of tweets is better than the baseline for its simplicity and high efficiency.

15 citations


Cites background from "Vertical and Sequential Sentiment A..."

  • ...as [1], [2], [5]) represent each word in the text as a realvalued, continuous and low-dimensional vector, which also known as Word Embedding....

    [...]

Posted Content
TL;DR: A thorough systematic review was carried out on Social Opinion Mining research, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats.
Abstract: Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels. This social interaction by online users includes submission of feedback, opinions and recommendations about various individuals, entities, topics, and events. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Therefore, through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence, which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, natural language processing tasks and other aspects derived from the published studies. Such multi-source information fusion plays a fundamental role in mining of people's social opinions from social media platforms. These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. Future research directions are presented, whereas further research and development has the potential of leaving a wider academic and societal impact.

11 citations


Cites background or methods from "Vertical and Sequential Sentiment A..."

  • ...Language Total Studies Chinese 53 [474, 289, 150, 286, 122, 415, 95, 199, 546, 90, 218, 57, 339, 424, 496, 347, 200, 106, 107, 355, 312, 439, 364, 139, 174, 207, 14, 134, 98, 135, 73, 520, 166, 74, 176, 510, 492, 386, 389, 78, 390, 493, 392, 394, 189, 396, 397, 470, 414, 398, 501, 430, 503] Spanish 11 [318, 54, 293, 240, 553, 481, 371, 180, 109, 393, 402] Indonesian 8 [485, 484, 460, 294, 197, 458, 488, 464] Italian 5 [335, 542, 65, 310, 409] Arabic 5 [462, 453, 330, 221, 181] Portuguese 3 [171, 172, 101] Brazilian Portuguese 3 [148, 499, 138] Japanese 3 [419, 61, 502] Korean 2 [363, 137] French 2 [288, 130] French - Bambara 1 [478] Bulgarian 1 [168] German 1 [442] Roman Urdu 1 [541] Russian 1 [298] Swiss German 1 [543] Thai 1 [186] Persian 1 [53] Bengala 1 [451] Vietnamese 1 [123]...

    [...]

  • ...Word embeddings Word embeddings, a type of word representation which allows words with a similar meaning to have a similar representation, were used by several studies [296, 73, 377, 379, 449, 510, 492, 295, 424, 347, 237, 199, 303, 299, 412, 291, 505, 452, 490, 447, 286, 415, 285] adopting a learning-based (Machine Learning, Deep Learning and Statistical) or hybrid approach....

    [...]

  • ...Lx ML DL St Pr Fz Rl Gr On Total Studies 3 3 114 [195, 285, 86, 316, 317, 318, 46, 319, 124, 320, 321, 322, 323, 324, 325, 326, 327, 85, 328, 329, 330, 331, 332, 333, 62, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 107, 352, 353, 354, 355, 312, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 16, 275, 371, 292, 73, 372, 373, 374, 375, 376, 47, 377, 378, 379, 380, 381, 99, 382, 383, 384, 385, 100, 386, 387, 388, 389, 390, 117, 391, 392, 393, 394, 395, 396, 397, 398, 91, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411] 3 3 12 [316, 412, 365, 343, 413, 376, 16, 99, 388, 391, 414, 411] 3 3 22 [195, 415, 416, 149, 417, 103, 418, 419, 327, 420, 421, 422, 423, 15, 424, 206, 425, 426, 427, 428, 429, 430] 3 3 3 [431, 432, 81] 3 3 4 [215, 433, 434, 435] 3 3 9 [436, 291, 437, 438, 371, 439, 440, 213, 441] 3 3 4 [312, 442, 443, 444] 3 3 2 [445, 446] 3 3 7 [285, 447, 288, 218, 448, 139, 449] 3 3 21 [302, 450, 451, 215, 452, 453, 454, 288, 455, 456, 457, 458, 459, 460, 461, 462, 463, 438, 464, 465, 466] 3 3 3 [123, 467, 468] 3 3 2 [469, 470] 3 3 3 [75, 78, 81] 3 3 3 [452, 455, 459] 3 3 1 [471] 3 3 1 [472] 3 3 1 [365] 3 3 1 [473]...

    [...]

  • ...The Technology industry-oriented studies (23) focused on either: company perception [415, 149, 81, 305, 508], products, such as mobile/smart phones [46, 55, 49, 56, 321, 72, 549, 201, 469, 119], laptops [83], electronics [225] tablets...

    [...]

Journal ArticleDOI
TL;DR: A deep active learning model with bidirectional encoder representations from transformers (BERT) for text classification that takes advantage of the self-attention mechanism to integrate contextual information, which is beneficial to accelerate the convergence of training.
Abstract: Active learning has been widely utilized to reduce the labeling cost of supervised learning. By selecting specific instances to train the model, the performance of the model was improved within limited steps. However, rare work paid attention to the effectiveness of active learning on it. In this paper, we proposed a deep active learning model with bidirectional encoder representations from transformers (BERT) for text classification. BERT takes advantage of the self-attention mechanism to integrate contextual information, which is beneficial to accelerate the convergence of training. As for the process of active learning, we design an instance selection strategy based on posterior probabilities Margin, Intra-correlation and Inter-correlation (MII). Selected instances are characterized by small margin, low intra-cohesion and high inter-cohesion. We conduct extensive experiments and analytics with our methods. The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets. The results show that our method outperforms the baselines in terms of accuracy.

11 citations

Book ChapterDOI
01 Jan 2021
TL;DR: In this article, a sentiment analysis system that can extract significant feature from corpus and make accurately analysis about emotional polarity of text corpus is proposed, and the system improvement direction of related system focuses on three aspects: data acquisition, feature extraction and classifier algorithm.
Abstract: Sentiment analysis is an important branch task of text classification and the related system usually is applied to in perception of user emotion and public opinion monitoring. By comparison, the text classification can be applied to more fields than sentiment analysis. In the system architecture, same as text classification, the complete classification system mainly contains data acquisition, data pre-process, feature extraction, classification algorithm and result output. The Web crawler usually be used in first step, the URL Link, hashtags, Non-Chinese text should be removed in second step. In feature extraction, the IG, TF-IDF, Word2vec usually be used. Then, the SVM, Naive Bayes, KNN or Neural network algorithm usually be used in classifier. Furthermore, as a system that can run automatically, the sentiment analysis system should be able to extract significant feature from corpus and make accurately analysis about emotional polarity of text corpus. At present, the system improvement direction of related system focuses on 3 aspects: data acquisition, feature extraction and classifier algorithm.

7 citations

References
More filters
Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

Proceedings Article
16 Jan 2013
TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

9,270 citations

Journal ArticleDOI
TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

7,537 citations

01 Jan 2002
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

6,980 citations

Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

6,626 citations