Vertical and Sequential Sentiment Analysis of Micro-blog Topic

doi:10.1007/978-3-030-05090-0_30

Home
/
Papers
/
Vertical and Sequential Sentiment Analysis of Micro-blog Topic

Book Chapter•DOI•

Vertical and Sequential Sentiment Analysis of Micro-blog Topic

Shuo Wan¹, Bohan Li¹, Anman Zhang¹, Kai Wang¹, Xue Li¹ - Show less +1 more•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

16 Nov 2018-pp 353-363

TL;DR: Distant supervised learning method based on micro-blog expressions and sentiment lexicon is proposed and fastText is used to train word vectors and classification model, which shows that the accuracy of the classifier reaches 92.2%, and the sequential sentiment analysis based on this classifier can accurately reflect the emotional trend of micro- blog topics.

read less

Abstract: Sentiment analysis of micro-blog topic aims to explore people’s attitudes towards a topic or event on social networks. Most existing research analyzed the micro-blog sentiment by traditional algorithms such as Naive Bayes and SVM based on the manually labelled data. They do not consider timeliness of data and inwardness of the topics. Meanwhile, few Chinese micro-blog sentiment analysis based on large-scale corpus is investigated. This paper focuses on the analysis of sequential sentiment based on a million-level Chinese micro-blog corpora to mine the features of sequential sentiment precisely. Distant supervised learning method based on micro-blog expressions and sentiment lexicon is proposed and fastText is used to train word vectors and classification model. The timeliness of analysis is guaranteed on the premise of ensuring the accuracy of classifier. The experiment shows that the accuracy of the classifier reaches 92.2%, and the sequential sentiment analysis based on this classifier can accurately reflect the emotional trend of micro-blog topics.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Over a Decade of Social Opinion Mining: A Systematic Review

[...]

Keith Cortis¹, Brian Davis¹•Institutions (1)

Dublin City University¹

05 Dec 2020-arXiv: Computation and Language

TL;DR: Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple media formats, like text, image, video and audio.

...read moreread less

Abstract: Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 published studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, and other aspects derived. Social Opinion Mining can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. The latest developments in Social Opinion Mining beyond 2018 are also presented together with future research directions, with the aim of leaving a wider academic and societal impact in several real-world applications.

...read moreread less

19 citations

Journal Article•DOI•

The OL-DAWE Model: Tweet Polarity Sentiment Analysis With Data Augmentation

[...]

Wenhuan Wang¹, Bohan Li¹, Ding Feng¹, Anman Zhang¹, Shuo Wan¹ - Show less +1 more•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 Jan 2020-IEEE Access

TL;DR: The combination of Conjunction Analysis (CA) technology and Punctuation Mark Identification (PMI) technology is used to detect negation cue and its scope and the OL-DAWE model, which uses Data Augmentation (DA) technology to generate opposed tweets according to the original tweet is proposed.

...read moreread less

Abstract: Introducing negative items into sentences can shift the polarity of emotional words and leads to misclassification. Therefore, dealing with the negative item is indispensable to the analysis of the polarity of tweets. This paper first uses the combination of Conjunction Analysis (CA) technology and Punctuation Mark Identification (PMI) technology to detect negation cue and its scope. Besides, we propose the OL-DAWE model, which uses Data Augmentation(DA) technology to generate opposed tweets according to the original tweet. The model extends the training data set, and test data set and learns the original and opposed sides of the tweet in the training module. When predicting the polarity of tweets, the OL-DAWE model considers the positive degree (negative degree) of the original tweet and the negative degree (positive degree) of its opposed tweet. We conduct experiments on two real-world data sets. We prove the effectiveness of our combined technology in negation processing and show that the OL-DAWE model in the polarity sentiment analysis of tweets is better than the baseline for its simplicity and high efficiency.

...read moreread less

15 citations

Cites background from "Vertical and Sequential Sentiment A..."

...as [1], [2], [5]) represent each word in the text as a realvalued, continuous and low-dimensional vector, which also known as Word Embedding....
[...]

Posted Content•

Over a Decade of Social Opinion Mining.

[...]

Keith Cortis, Brian Davis

05 Dec 2020-arXiv: Computation and Language

TL;DR: A thorough systematic review was carried out on Social Opinion Mining research, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats.

...read moreread less

Abstract: Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels. This social interaction by online users includes submission of feedback, opinions and recommendations about various individuals, entities, topics, and events. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Therefore, through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence, which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, natural language processing tasks and other aspects derived from the published studies. Such multi-source information fusion plays a fundamental role in mining of people's social opinions from social media platforms. These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. Future research directions are presented, whereas further research and development has the potential of leaving a wider academic and societal impact.

...read moreread less

11 citations

Cites background or methods from "Vertical and Sequential Sentiment A..."

...Language Total Studies Chinese 53 [474, 289, 150, 286, 122, 415, 95, 199, 546, 90, 218, 57, 339, 424, 496, 347, 200, 106, 107, 355, 312, 439, 364, 139, 174, 207, 14, 134, 98, 135, 73, 520, 166, 74, 176, 510, 492, 386, 389, 78, 390, 493, 392, 394, 189, 396, 397, 470, 414, 398, 501, 430, 503] Spanish 11 [318, 54, 293, 240, 553, 481, 371, 180, 109, 393, 402] Indonesian 8 [485, 484, 460, 294, 197, 458, 488, 464] Italian 5 [335, 542, 65, 310, 409] Arabic 5 [462, 453, 330, 221, 181] Portuguese 3 [171, 172, 101] Brazilian Portuguese 3 [148, 499, 138] Japanese 3 [419, 61, 502] Korean 2 [363, 137] French 2 [288, 130] French - Bambara 1 [478] Bulgarian 1 [168] German 1 [442] Roman Urdu 1 [541] Russian 1 [298] Swiss German 1 [543] Thai 1 [186] Persian 1 [53] Bengala 1 [451] Vietnamese 1 [123]...
[...]
...Word embeddings Word embeddings, a type of word representation which allows words with a similar meaning to have a similar representation, were used by several studies [296, 73, 377, 379, 449, 510, 492, 295, 424, 347, 237, 199, 303, 299, 412, 291, 505, 452, 490, 447, 286, 415, 285] adopting a learning-based (Machine Learning, Deep Learning and Statistical) or hybrid approach....
[...]
...Lx ML DL St Pr Fz Rl Gr On Total Studies 3 3 114 [195, 285, 86, 316, 317, 318, 46, 319, 124, 320, 321, 322, 323, 324, 325, 326, 327, 85, 328, 329, 330, 331, 332, 333, 62, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 107, 352, 353, 354, 355, 312, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 16, 275, 371, 292, 73, 372, 373, 374, 375, 376, 47, 377, 378, 379, 380, 381, 99, 382, 383, 384, 385, 100, 386, 387, 388, 389, 390, 117, 391, 392, 393, 394, 395, 396, 397, 398, 91, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411] 3 3 12 [316, 412, 365, 343, 413, 376, 16, 99, 388, 391, 414, 411] 3 3 22 [195, 415, 416, 149, 417, 103, 418, 419, 327, 420, 421, 422, 423, 15, 424, 206, 425, 426, 427, 428, 429, 430] 3 3 3 [431, 432, 81] 3 3 4 [215, 433, 434, 435] 3 3 9 [436, 291, 437, 438, 371, 439, 440, 213, 441] 3 3 4 [312, 442, 443, 444] 3 3 2 [445, 446] 3 3 7 [285, 447, 288, 218, 448, 139, 449] 3 3 21 [302, 450, 451, 215, 452, 453, 454, 288, 455, 456, 457, 458, 459, 460, 461, 462, 463, 438, 464, 465, 466] 3 3 3 [123, 467, 468] 3 3 2 [469, 470] 3 3 3 [75, 78, 81] 3 3 3 [452, 455, 459] 3 3 1 [471] 3 3 1 [472] 3 3 1 [365] 3 3 1 [473]...
[...]
...The Technology industry-oriented studies (23) focused on either: company perception [415, 149, 81, 305, 508], products, such as mobile/smart phones [46, 55, 49, 56, 321, 72, 549, 201, 469, 119], laptops [83], electronics [225] tablets...
[...]

Journal Article•DOI•

MII: A novel text classification model combining deep active learning with BERT

[...]

Anman Zhang, Bohan Li, Wenhuan Wang, Shuo Wan, Weitong Chen - Show less +1 more

01 Jan 2020-Cmc-computers Materials & Continua

TL;DR: A deep active learning model with bidirectional encoder representations from transformers (BERT) for text classification that takes advantage of the self-attention mechanism to integrate contextual information, which is beneficial to accelerate the convergence of training.

...read moreread less

Abstract: Active learning has been widely utilized to reduce the labeling cost of supervised learning. By selecting specific instances to train the model, the performance of the model was improved within limited steps. However, rare work paid attention to the effectiveness of active learning on it. In this paper, we proposed a deep active learning model with bidirectional encoder representations from transformers (BERT) for text classification. BERT takes advantage of the self-attention mechanism to integrate contextual information, which is beneficial to accelerate the convergence of training. As for the process of active learning, we design an instance selection strategy based on posterior probabilities Margin, Intra-correlation and Inter-correlation (MII). Selected instances are characterized by small margin, low intra-cohesion and high inter-cohesion. We conduct extensive experiments and analytics with our methods. The effect of learner is compared while the effect of sampling strategy and text classification is assessed from three real datasets. The results show that our method outperforms the baselines in terms of accuracy.

...read moreread less

11 citations

Book Chapter•DOI•

A literature review on text classification and sentiment analysis approaches

[...]

Wang Da-wei¹, Rayner Alfred², Joe Henry Obit², Chin Kim On²•Institutions (2)

Hebei University of Engineering¹, Universiti Malaysia Sabah²

01 Jan 2021

TL;DR: In this article, a sentiment analysis system that can extract significant feature from corpus and make accurately analysis about emotional polarity of text corpus is proposed, and the system improvement direction of related system focuses on three aspects: data acquisition, feature extraction and classifier algorithm.

...read moreread less

Abstract: Sentiment analysis is an important branch task of text classification and the related system usually is applied to in perception of user emotion and public opinion monitoring. By comparison, the text classification can be applied to more fields than sentiment analysis. In the system architecture, same as text classification, the complete classification system mainly contains data acquisition, data pre-process, feature extraction, classification algorithm and result output. The Web crawler usually be used in first step, the URL Link, hashtags, Non-Chinese text should be removed in second step. In feature extraction, the IG, TF-IDF, Word2vec usually be used. Then, the SVM, Naive Bayes, KNN or Neural network algorithm usually be used in classifier. Furthermore, as a system that can run automatically, the sentiment analysis system should be able to extract significant feature from corpus and make accurately analysis about emotional polarity of text corpus. At present, the system improvement direction of related system focuses on 3 aspects: data acquisition, feature extraction and classifier algorithm.

...read moreread less

7 citations

References

PDF

Open Access

More filters

Posted Content•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013-arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

...read moreread less

20,077 citations

Proceedings Article•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

9,270 citations

Journal Article•DOI•

Enriching Word Vectors with Subword Information

[...]

Piotr Bojanowski¹, Edouard Grave¹, Armand Joulin¹, Tomas Mikolov¹•Institutions (1)

Facebook¹

12 Jun 2017-Transactions of the Association for Computational Linguistics

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

...read moreread less

7,537 citations

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

[...]

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan

01 Jan 2002

TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.

...read moreread less

Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

...read moreread less

6,980 citations

Proceedings Article•DOI•

Thumbs up? Sentiment Classification using Machine Learning Techniques

[...]

Bo Pang¹, Lillian Lee¹, Shivakumar Vaithyanathan²•Institutions (2)

Cornell University¹, IBM²

06 Jul 2002

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.

...read moreread less

Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

...read moreread less

6,626 citations