Entity-Based Opinion Mining from Text and Multimedia

doi:10.1007/978-3-319-18458-6_4

Entity-based Opinion Mining from Text and

Multimedia

Diana Maynard and Jonathon Hare

1 Introduction

Social web analysis is all about the users who are actively engaged and generate

content. This content is dynamic, reﬂecting the societal and sentimental ﬂuctuations

of the authors as well as the ever-changing use of language. Social networks are

pools of a wide range of articulation methods, from simple ”Like” buttons to com-

plete articles, their content representing the diversity of opinions of the public. User

activities on social networking sites are often triggered by speciﬁc events and re-

lated entities (e.g. sports events, celebrations, crises, news articles) and topics (e.g.

global warming, ﬁnancial crisis, swine ﬂu).

With the rapidly growing volume of resources on the Web, archiving this material

becomes an important challenge. The notion of community memories extends tradi-

tional Web archives with related data from a variety of sources. In order to include

this information, a semantically-aware and socially-driven preservation model is a

natural way to go: the exploitation of Web 2.0 and the wisdom of crowds can make

web archiving a more selective and meaning-based process. The analysis of social

media can help archivists select material for inclusion, while social media mining

can enrich archives, moving towards structured preservation around semantic cat-

egories. In this paper, we focus on the challenges in the development of opinion

mining tools from both textual and multimedia content.

We focus on two very different domains: socially aware federated political

archiving (realised by the national parliaments of Greece and Austria), and socially

contextualized broadcaster web archiving (realised by two large multimedia broad-

Diana Maynard

Department of Computer Science, University of Shefﬁeld, Regent Court, 211 Portobello, Shefﬁeld,

S1 4DP, UK e-mail: diana@dcs.shef.ac.uk

Jonathon Hare

Electronics and Computer Science, University of Southampton, Southampton, Hampshire,

SO17 1BJ, UK e-mail: jsh2@ecs.soton.ac.uk

1

2 Diana Maynard and Jonathon Hare

casting organizations based in Germany: Sudwestrundfunk and Deutsche Welle).

The aim is to help journalists and archivists answer questions such as what the opin-

ions are on crucial social events, how they are distributed, how they have evolved,

who the opinion leaders are, and what their impact and inﬂuence is.

Alongside natural language, a large number of the interactions which occur be-

tween social web participants include other media, in particular images. Determin-

ing whether a speciﬁc non-textual media item is performing as an opinion-forming

device in some interaction becomes an important challenge, more so when the tex-

tual content of some interaction is small or has no strong sentiment. Attempting to

determine a sentiment value for an image clearly presents great challenges, and this

ﬁeld of research is still in its infancy. We describe here some work we have been

undertaking, ﬁrstly to attempt to provide a sentiment value from an image outside

of any speciﬁc context, and secondly to utilise the multimodal nature of the social

web to assist the sentiment analysis of either the multimedia or the text.

2 Related Work

While much work has recently focused on the analysis of social media in order to

get a feel for what people think about current topics of interest, there are, however,

still many challenges to be faced. State of the art opinion mining approaches that

focus on product reviews and so on are not necessarily suitable for our task, partly

because they typically operate within a single narrow domain, and partly because

the target of the opinion is either known in advance or at least has a limited subset

(e.g. ﬁlm titles, product names, companies, political parties, etc.).

In general, sentiment detection techniques can be roughly divided into lexicon-

based methods [1] and machine-learning methods, e.g. [2]. Lexicon-based meth-

ods rely on a sentiment lexicon, a collection of known and pre-compiled sentiment

terms. Machine learning approaches make use of syntactic and/or linguistic features,

and hybrid approaches are very common, with sentiment lexicons playing a key

role in the majority of methods. For example, [3] establish the polarity of reviews

by identifying the polarity of the adjectives that appear in them, with a reported

accuracy of about 10% higher than pure machine learning techniques. However,

such relatively successful techniques often fail when moved to new domains or text

types, because they are inﬂexible regarding the ambiguity of sentiment terms. The

context in which a term is used can change its meaning, particularly for adjectives in

sentiment lexicons [4]. Several evaluations have shown the usefulness of contextual

information [5], and have identiﬁed context words with a high impact on the po-

larity of ambiguous terms [6]. A further bottleneck is the time-consuming creation

of these sentiment dictionaries, though solutions have been proposed in the form of

crowdsourcing techniques

1

.

1

http://apps.facebook.com/sentiment-quiz

Entity-based Opinion Mining from Text and Multimedia 3

Almost all the work on opinion mining from Twitter has used machine learning

techniques. [7] aimed to classify arbitrary tweets on the basis of positive, negative

and neutral sentiment, constructing a simple binary classiﬁer which used n-gram and

POS features, and trained on instances which had been annotated according to the

existence of positive and negative emoticons. Their approach has much in common

with an earlier sentiment classiﬁer constructed by [8], which also used unigrams,

bigrams and POS tags, though the former demonstrated through analysis that the

distribution of certain POS tags varies between positive and negative posts. One of

the reasons for the relative paucity of linguistic techniques for opinion mining on

social media is most likely due to the difﬁculties in using NLP on low quality text

[9]; for example. the Stanford NER drops from 90.8% F1 to 45.88% when applied

to a corpus of tweets [10].

There have been a number of recent works attempting to detect sarcasm in tweets

and other user-generated content [11, 12, 13, 14], with accuracy typically around

70-80%. These mostly train over a set of tweets with the #sarcasm and/or #irony

hashtags, but all simply try to classify whether a sentence or tweet is sarcastic or not

(and occasionally, into a set of pre-deﬁned sarcasm types). However, none of these

approaches go beyond the initial classiﬁcation step and thus cannot predict how the

sarcasm will affect the sentiment expressed. This is one of the issues that we tackle

in our work.

Extracting sentiment from images is still a research area that is in its infancy and

not yet proliﬁcally published. However, those published often use small datasets

for their ground truth on which to build SVM classiﬁers. Evaluations show systems

often respond only a little better than chance for trained emotions from general

images [15]. The implication is that the feature selection for such classiﬁcation is

difﬁcult. [16] used a set of colour features for classifying their small ground-truth

dataset, also using SVMs, and publish an accuracy of around 87%. In our work, we

expand this colour-based approach to use other features and also use the wisdom of

the crowd for selecting a large ground-truth dataset.

Other papers have begun to hint at the multimodal nature of web-based image

sentiment. Earlier work, such as [17], is concerned with similar multimodal image

annotation, but not speciﬁcally for sentiment. They use latent semantic spaces for

correlating image features and text in a single feature space. In this paper, we de-

scribe the work we have been undertaking in using text and images together to form

sentiment for social media.

3 Opinion Mining from Text

3.1 Challenges

There are many challenges inherent in applying typical opinion mining and sen-

timent analysis techniques to social media. Microposts such as tweets are, in some

4 Diana Maynard and Jonathon Hare

sense, the most challenging text type for text mining tools, and in particular for opin-

ion mining, since the genre is noisy, documents have little context and assume much

implicit knowledge, and utterances are often short. As such, conventional NLP tools

typically do not perform well when faced with tweets [18], and their performance

also negatively affects any following processing steps.

Ambiguity is a particular problem for tweets, since we cannot easily make use

of coreference information: unlike in blog posts and comments, tweets do not typ-

ically follow a conversation thread, and appear much more in isolation from other

tweets. They also exhibit much more language variation, and make frequent use

of emoticons, abbreviations and hashtags, which can form an important part of the

meaning. Typically, they also contain extensive use of irony and sarcasm, which are

particularly difﬁcult for a machine to detect. On the other hand, their terseness can

also be beneﬁcial in focusing the topics more explicitly: it is very rare for a single

tweet to be related to more than one topic, which can thus aid disambiguation by

emphasising situational relatedness.

In longer posts such as blogs, comments on news articles and so on, a further

challenge is raised by the tracking of changing and conﬂicting interpretations in

discussion threads. We investigate ﬁrst steps towards a consistent model allowing

for the pinpointing of opinion holders and targets within a thread (leveraging the

information on relevant entities extracted).

We refer the reader to [18] for our work on twitter-speciﬁc IE, which we use as

pre-processing for the opinion mining described below. It is not just tweets that are

problematic, however; sarcasm and noisy language from other social media forms

also have an impact. In the following section, we demonstrate some ways in which

we deal with this.

3.2 Opinion Mining Application

Our approach is a rule-based one similar to that used by [1], focusing on building

up a number of sub-components which all have an effect on the score and polarity

of a sentiment. In contrast, however, our opinion mining component ﬁnds opinions

relating to previously identiﬁed entities and events in the text. The core opinion

mining component is described in [19], so we shall only give an overview here, and

focus on some issues speciﬁc to social media which were not dealt with in that work,

such as sarcasm detection and hashtag decomposition.

The detection of the actual opinion is performed via a number of different phases:

detecting positive, negative and neutral words, identifying factual or opinionated

versus questions or doubtful statements, identifying negatives, sarcasm and irony,

analysing hashtags, and detecting extra-linguistic clues such as smileys. The appli-

cation involves a set of grammars which create annotations on segments of text.

The grammar rules use information from gazetteers combined with linguistic fea-

tures (POS tags etc.) and contextual information to build up a set of annotations and

features, which can be modiﬁed at any time by further rules. The set of gazetteer

Entity-based Opinion Mining from Text and Multimedia 5

lists contains useful clues and context words: for example, we have developed a

gazetteer of affect/emotion words from WordNet [20]. The lists have been modiﬁed

and extended manually to improve their quality.

Once sentiment words have been matched, we ﬁnd a linguistic relation between

these and an entity or event in the sentence or phrase. A Sentiment annotation is

created for that entity or event, with features denoting the polarity (positive or nega-

tive) and the polarity score. Scores are based on the initial sentiment word score, and

intensiﬁed or decreased by any modiﬁers such as swear words, adverbs, negation,

sarcasm etc, as explained next.

Swear words are particularly proliﬁc on Twitter, especially on topics such as

popular culture, politics and religion, where people tend to have very strong views.

To deal with these, we match against a gazetteer list of swear words and phrases,

which was created manually from various lists found on the web and from manual

inspection of the data, including some words acquired by collecting tweets with

swear words as hashtags (which also often contain more swear words in the main

text of the tweet).

Much useful sentiment information is contained within hashtags, but this is prob-

lematic to identify because hashtags typically contain multiple words within a single

token, e.g. #notreally. If a hashtag is camelcased, we use the capitalisation informa-

tion to create separate tokens. Second, if the hashtag is all lowercase or all upper-

case, we try to form a token match against the Linux dictionary. Working from left

to right, we look for the longest match against a known word, and then continue

from the next offset. If a combination of matches can be found without a break, the

individual components are converted to tokens. In our example, #notreally would

be correctly identiﬁed as “not” + “really”. However, some hashtags are ambiguous:

for example, ”#greatstart” gets split wrongly into the two tokens ”greats” + ”tart”.

These problems are hard to deal with; in some cases, we could make use of contex-

tual information to assist.

We conducted an experiment to measure the accuracy of hashtag decomposition,

using a corpus of 1000 tweets randomly selected from the US elections crawl that

we undertook in the project. 944 hashtags were detected in this corpus, of which

408 were identiﬁed as multiword hashtags (we included combinations of letters and

numbers as multiword, but not abbreviations). 281 were camelcased and/or com-

binations of letters and nubers, 27 were foreign words, and the remaining 100 had

no obvious token-distinguishing features. Evaluation on the hard-to-recognise cases

(non-camel-cased multiword hashtags) produced scores of 86.91% Precision, 90%

Recall, and an F-measure of 88.43%. Given that these hard-to-resolve combinations

form roughly a quarter of the multiword hashtags in our corpus, and that we are en-

tirely successful in decomposing the remaining hashtags, this means that the overall

accuracy for hashtag decomposition is much higher.

In addition to using the sentiment information from these hashtags, we also col-

lect new hashtags that typically indicate sarcasm, since often more than one sarcastic

hashtag is used. For this, we used the GATE gazetteer list collector to collect pairs

of hashtags where one was known to be sarcastic, and examined the second hashtag

manually. From this we were able to identify a further set of sarcasm-indicating

Entity-Based Opinion Mining from Text and Multimedia

Figures

Citations

Multimodal Sentiment Analysis: A Survey and Comparison

Natural Language Processing for the Semantic Web

Challenges of Evaluating Sentiment Analysis Tools on Social Media

Automated Content Analysis: A Sentiment Analysis on Malaysian Government Social Media

Regional Sentiment Bias in Social Media Reporting During Crises

References

Distinctive Image Features from Scale-Invariant Keypoints

Distinctive Image Features from Scale-Invariant Keypoints

Robust Real-Time Face Detection

Robust real-time face detection

Active shape models—their training and application

Related Papers (5)

Multimodal Sentiment Analysis of Social Media

Word-of-Mouth Understanding: Entity-Centric Multimodal Aspect-Opinion Mining in Social Media

Improving opinion retrieval in social media by combining features-based coreferencing and memory-based learning

A holistic lexicon-based approach to opinion mining

Opinion Retrieval: Searching for Opinions in Social Media