SemEval-2016 Task 4: Sentiment Analysis in Twitter

doi:10.18653/V1/S16-1001

Home
/
Papers
/
SemEval-2016 Task 4: Sentiment Analysis in Twitter

Proceedings Article•DOI•

SemEval-2016 Task 4: Sentiment Analysis in Twitter

Preslav Nakov¹, Alan Ritter², Sara Rosenthal³, Fabrizio Sebastiani⁴, Veselin Stoyanov⁵ - Show less +1 more•Institutions (5)

Qatar Foundation¹, Ohio State University², Columbia University³, Qatar Computing Research Institute⁴, Facebook⁵

01 Jun 2016-pp 1-18

TL;DR: The SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. as mentioned in this paper discusses the fourth year of the Sentiment Analysis in Twitter Task and discusses the three new subtasks focus on two variants of the basic sentiment classification in Twitter task.

read less

Abstract: This paper discusses the fourth year of the ”Sentiment Analysis in Twitter Task”. SemEval-2016 Task 4 comprises five subtasks, three of which represent a significant departure from previous editions. The first two subtasks are reruns from prior years and ask to predict the overall sentiment, and the sentiment towards a topic in a tweet. The three new subtasks focus on two variants of the basic “sentiment classification in Twitter” task. The first variant adopts a five-point scale, which confers an ordinal character to the classification task. The second variant focuses on the correct estimation of the prevalence of each class of interest, a task which has been called quantification in the supervised learning literature. The task continues to be very popular, attracting a total of 43 teams.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Survey on Deep Learning: Algorithms, Techniques, and Applications

[...]

Samira Pouyanfar¹, Saad Sadiq², Yilin Yan², Haiman Tian¹, Yudong Tao², Maria Presa Reyes¹, Mei-Ling Shyu², Shu-Ching Chen¹, S. Sitharama Iyengar¹ - Show less +5 more•Institutions (2)

Florida International University¹, University of Miami²

18 Sep 2018-ACM Computing Surveys

TL;DR: A comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing is presented, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications.

...read moreread less

Abstract: The field of machine learning is witnessing its golden era as deep learning slowly becomes the leader in this domain. Deep learning uses multiple layers to represent the abstractions of data to build computational models. Some key enabler deep learning algorithms such as generative adversarial networks, convolutional neural networks, and model transfers have completely changed our perception of information processing. However, there exists an aperture of understanding behind this tremendously fast-paced domain, because it was never previously represented from a multiscope perspective. The lack of core understanding renders these powerful methods as black-box machines that inhibit development at a fundamental level. Moreover, deep learning has repeatedly been perceived as a silver bullet to all stumbling blocks in machine learning, which is far from the truth. This article presents a comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications. It was also undertaken to review the issues faced in deep learning such as unsupervised learning, black-box models, and online learning and to illustrate how these challenges can be transformed into prolific future research avenues.

...read moreread less

824 citations

Proceedings Article•DOI•

DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis.

[...]

Christos Baziotis¹, Nikos Pelekis², Christos Doulkeridis²•Institutions (2)

National Technical University of Athens¹, University of Piraeus²

01 Aug 2017

TL;DR: Two deep-learning systems that competed at SemEval-2017 Task 4 “Sentiment Analysis in Twitter” are presented, which use Long Short-Term Memory networks augmented with two kinds of attention mechanisms, on top of word embeddings pre-trained on a big collection of Twitter messages.

...read moreread less

Abstract: In this paper we present two deep-learning systems that competed at SemEval-2017 Task 4 “Sentiment Analysis in Twitter”. We participated in all subtasks for English tweets, involving message-level and topic-based sentiment polarity classification and quantification. We use Long Short-Term Memory (LSTM) networks augmented with two kinds of attention mechanisms, on top of word embeddings pre-trained on a big collection of Twitter messages. Also, we present a text processing tool suitable for social network messages, which performs tokenization, word normalization, segmentation and spell correction. Moreover, our approach uses no hand-crafted features or sentiment lexicons. We ranked 1st (tie) in Subtask A, and achieved very competitive results in the rest of the Subtasks. Both the word embeddings and our text processing tool are available to the research community.

...read moreread less

449 citations

Journal Article•DOI•

A survey of multimodal sentiment analysis

[...]

Mohammad Soleymani¹, David Garcia², Brendan Jou³, Björn Schuller¹, Björn Schuller⁴, Björn Schuller⁵, Shih-Fu Chang³, Maja Pantic⁴, Maja Pantic⁶ - Show less +5 more•Institutions (6)

University of Geneva¹, ETH Zurich², Columbia University³, Imperial College London⁴, University of Passau⁵, University of Twente⁶

01 Sep 2017-Image and Vision Computing

TL;DR: The thesis is that multimodal sentiment analysis holds a significant untapped potential with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis.

...read moreread less

357 citations

Proceedings Article•DOI•

SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis

[...]

Erik Cambria¹, Yang Li¹, Frank Z. Xing¹, Soujanya Poria², Kenneth Kwok³ - Show less +1 more•Institutions (3)

Nanyang Technological University¹, Singapore University of Technology and Design², Agency for Science, Technology and Research³

19 Oct 2020

TL;DR: This work integrates logical reasoning within deep learning architectures to build a new version of SenticNet, a commonsense knowledge base for sentiment analysis, and applies it to the interesting problem of polarity detection from text.

...read moreread less

Abstract: Deep learning has unlocked new paths towards the emulation of the peculiarly-human capability of learning from examples. While this kind of bottom-up learning works well for tasks such as image classification or object detection, it is not as effective when it comes to natural language processing. Communication is much more than learning a sequence of letters and words: it requires a basic understanding of the world and social norms, cultural awareness, commonsense knowledge, etc.; all things that we mostly learn in a top-down manner. In this work, we integrate top-down and bottom-up learning via an ensemble of symbolic and subsymbolic AI tools, which we apply to the interesting problem of polarity detection from text. In particular, we integrate logical reasoning within deep learning architectures to build a new version of SenticNet, a commonsense knowledge base for sentiment analysis.

...read moreread less

336 citations

Hate Me, Hate Me Not: Hate Speech Detection on Facebook.

[...]

Fabio Del Vigna, Andrea Cimino, Felice Dell'Orletta, Marinella Petrocchi, Maurizio Tesconi - Show less +1 more

01 Jan 2017

TL;DR: This work proposes a variety of hate categories and designs and implements two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM).

...read moreread less

Abstract: While favouring communications and easing information sharing, Social Network Sites are also used to launch harmful campaigns against specific groups and individuals. Cyberbullism, incitement to self-harm practices, sexual predation are just some of the severe effects of massive online offensives. Moreover, attacks can be carried out against groups of victims and can degenerate in physical violence. In this work, we aim at containing and preventing the alarming diffusion of such hate campaigns. Using Facebook as a benchmark, we consider the textual content of comments appeared on a set of public Italian pages. We first propose a variety of hate categories to distinguish the kind of hate. Crawled comments are then annotated by up to five distinct human annotators, according to the defined taxonomy. Leveraging morpho-syntactical features, sentiment polarity and word embedding lexicons, we design and implement two classifiers for the Italian language, based on different learning algorithms: the first based on Support Vector Machines (SVM) and the second on a particular Recurrent Neural Network named Long Short Term Memory (LSTM). We test these two learning algorithms in order to verify their classification performances on the task of hate speech recognition. The results show the effectiveness of the two classification approaches tested over the first manually annotated Italian Hate Speech Corpus of social media text.

...read moreread less

286 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Glove: Global Vectors for Word Representation

[...]

Jeffrey Pennington¹, Richard Socher², Christopher D. Manning¹•Institutions (2)

Stanford University¹, University of Colorado Boulder²

01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

...read moreread less

30,558 citations

Journal Article•DOI•

The Earth Mover's Distance as a Metric for Image Retrieval

[...]

Yossi Rubner¹, Carlo Tomasi¹, Leonidas J. Guibas¹•Institutions (1)

Stanford University¹

01 Nov 2000-International Journal of Computer Vision

TL;DR: This paper investigates the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval, and compares the retrieval performance of the EMD with that of other distances.

...read moreread less

Abstract: We investigate the properties of a metric between two distributions, the Earth Mover's Distance (EMD), for content-based image retrieval. The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense, and was first proposed for certain vision problems by Peleg, Werman, and Rom. For image retrieval, we combine this idea with a representation scheme for distributions that is based on vector quantization. This combination leads to an image comparison framework that often accounts for perceptual similarity better than other previously proposed methods. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.

...read moreread less

4,593 citations

Journal Article•DOI•

Twitter mood predicts the stock market.

[...]

Johan Bollen¹, Huina Mao¹, Xiao-Jun Zeng²•Institutions (2)

Indiana University¹, University of Manchester²

01 Mar 2011-Journal of Computational Science

TL;DR: This work investigates whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time and indicates that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others.

...read moreread less

4,453 citations

Proceedings Article•

Linguistic Regularities in Continuous Space Word Representations

[...]

Tomas Mikolov¹, Wen-tau Yih², Geoffrey Zweig²•Institutions (2)

Brno University of Technology¹, Microsoft²

27 May 2013

TL;DR: The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.

...read moreread less

Abstract: Continuous space language models have recently demonstrated outstanding results across a variety of tasks. In this paper, we examine the vector-space word representations that are implicitly learned by the input-layer weights. We find that these representations are surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset. This allows vector-oriented reasoning based on the offsets between words. For example, the male/female relationship is automatically learned, and with the induced vector representations, “King Man + Woman” results in a vector very close to “Queen.” We demonstrate that the word vectors capture syntactic regularities by means of syntactic analogy questions (provided with this paper), and are able to correctly answer almost 40% of the questions. We demonstrate that the word vectors capture semantic regularities by using the vector offset method to answer SemEval-2012 Task 2 questions. Remarkably, this method outperforms the best previous systems.

...read moreread less

3,300 citations

Proceedings Article•

SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.

[...]

Stefano Baccianella¹, Andrea Esuli¹, Fabrizio Sebastiani²•Institutions (2)

Istituto di Scienza e Tecnologie dell'Informazione¹, National Research Council²

01 May 2010

TL;DR: This work discusses SENTIWORDNET 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications, and reports on the improvements concerning aspect (b) that it embodies with respect to version 1.0.

...read moreread less

Abstract: In this work we present SENTIWORDNET 30, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications SENTIWORDNET 30 is an improved version of SENTIWORDNET 10, a lexical resource publicly available for research purposes, now currently licensed to more than 300 research groups and used in a variety of research projects worldwide Both SENTIWORDNET 10 and 30 are the result of automatically annotating all WORDNET synsets according to their degrees of positivity, negativity, and neutrality SENTIWORDNET 10 and 30 differ (a) in the versions of WORDNET which they annotate (WORDNET 20 and 30, respectively), (b) in the algorithm used for automatically annotating WORDNET, which now includes (additionally to the previous semi-supervised learning step) a random-walk step for refining the scores We here discuss SENTIWORDNET 30, especially focussing on the improvements concerning aspect (b) that it embodies with respect to version 10 We also report the results of evaluating SENTIWORDNET 30 against a fragment of WORDNET 30 manually annotated for positivity, negativity, and neutrality; these results indicate accuracy improvements of about 20% with respect to SENTIWORDNET 10

...read moreread less

2,870 citations