scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

How (Not) to Predict Elections

01 Oct 2011-pp 165-171
TL;DR: It is found that electoral predictions using the published research methods on Twitter data are not better than chance and a set of standards that any theory aiming to predict elections (or other social events) using social media should follow is proposed.
Abstract: Using social media for political discourse is increasingly becoming common practice, especially around election time Arguably, one of the most interesting aspects of this trend is the possibility of ''pulsing'' the public's opinion in near real-time and, thus, it has attracted the interest of many researchers as well as news organizations Recently, it has been reported that predicting electoral outcomes from social media data is feasible, in fact it is quite simple to compute Positive results have been reported in a few occasions, but without an analysis on what principle enables them This, however, should be surprising given the significant differences in the demographics between likely voters and users of online social networks This work aims to test the predictive power of social media metrics against several Senate races of the two recent US Congressional elections We review the findings of other researchers and we try to duplicate their findings both in terms of data volume and sentiment analysis Our research aim is to shed light on why predictions of electoral (or other social events) using social media might or might not be feasible In this paper, we offer two conclusions and a proposal: First, we find that electoral predictions using the published research methods on Twitter data are not better than chance Second, we reveal some major challenges that limit the predictability of election results through data from social media We propose a set of standards that any theory aiming to predict elections (or other social events) using social media should follow

Content maybe subject to copyright    Report

How (Not) To Predict Elections
Panagiotis T. Metaxas, Eni Mustafaraj
Department of Computer Science
Wellesley College
Wellesley, MA, USA
(pmetaxas, emustafa)@wellesley.edu
Daniel Gayo-Avello
Departamento de Inform
´
atica
Universidad de Oviedo
Oviedo, Asturias, Spain
dani@uniovi.es
Abstract—Using social media for political discourse is increas-
ingly becoming common practice, especially around election time.
Arguably, one of the most interesting aspects of this trend is the
possibility of “pulsing” the public’s opinion in near real-time and,
thus, it has attracted the interest of many researchers as well as
news organizations. Recently, it has been reported that predicting
electoral outcomes from social media data is feasible, in fact it is
quite simple to compute. Positive results have been reported in a
few occasions, but without an analysis on what principle enables
them. This, however, should be surprising given the significant
differences in the demographics between likely voters and users
of online social networks.
This work aims to test the predictive power of social media
metrics against several Senate races of the two recent US Con-
gressional elections. We review the findings of other researchers
and we try to duplicate their findings both in terms of data
volume and sentiment analysis. Our research aim is to shed light
on why predictions of electoral (or other social events) using
social media might or might not be feasible. In this paper, we
offer two conclusions and a proposal: First, we find that electoral
predictions using the published research methods on Twitter
data are not better than chance. Second, we reveal some major
challenges that limit the predictability of election results through
data from social media. We propose a set of standards that any
theory aiming to predict elections (or other social events) using
social media should follow.
I. INTRODUCTION
In recent years, the use of social media for communication
has dramatically increased. Research has shown that 22% of
adult internet users were engaged with the political campaign
on Twitter, Facebook and Myspace in the months leading
up to the November 2010 US elections [1]. Empowered by
the APIs that many social media companies make available,
researchers are engaged in an effort to analyze and make
sense of the data collected through these social communication
channels. Theoretically, social media data, if used correctly,
can lead to predictions of events in the near future influenced
by human behavior. In fact, to describe this phenomenon,
[2] talk about “predicting the future” while [3] have coined
the term “predicting the present”. In fact, researchers have
reported that the volume of Twitter chat over time can be
used to predict several kinds of consumer metrics such as the
likelihood of success of new movies before their release [2]
and the marketability of consumer goods [4]. These predictions
are explained by the perceived ability of Twitter chat volume
and Google Search Trends to monitor and record general social
trends as they occur.
Being able to make predictions based on publicly available
data would have numerous benefits in areas such as health (e.g.
predictions of flu epidemics [5], [6]), business (e.g., prediction
of box-office success of movies [7] and product marketability
[4]), economics (e.g., predictions on stock market trends and
housing market trends [3], [8], [9]), and politics (e.g., trends
in public opinion [10]), to name a few.
However, there have also been reports on Twitter’s ability to
predict with amazing accuracy the voting results in the recent
2009 German elections [11] and in the 2010 US Congressional
elections [12]. Given the significant differences in the demo-
graphics between likely voters and users of social networks [1]
questions arise on what is the underlying operating principle
enabling these predictions. Could it be simply a matter of
coincidence or is there a reason why general trends are as
accurate as specific demographics? Should we expect these
methods to be accurate again in future elections? These are
the questions we seek to address with our work.
The rest of this paper is organized as follows: The next
section II reviews past research on electoral predictions using
social media data. Section III describes a number of new
experiments we conducted testing the predictability of the last
two rounds of US elections based on Twitter volume and
sentiment analysis. Section IV describes a set of standards
that any methodology of electoral predictions should follow
in order to be consistently competent against the statistical
sampling methods employed by professional pollsters. The
final section V has our conclusions and proposes new lines
of research.
II. PREDICTING PAST ELECTIONS
In the previous section we mentioned some of the attempts
to use Twitter and Google Trends for predictions of real
world outcomes and external market events. What about the
important area of elections? One would expect that, following
the previous research literature (e.g. [11], [12]), and given the
high utilization that the Web and online social networks have
in the US [1], Twitter volume should be have been able to
predict consistently the outcomes of the US Congressional
elections. Let us examine the instances and methods that
have been used in the past in the claims of electoral results
predictions and discuss their predictive power.

A. Claims that Social Media Data predicted elections
The word “prediction” means foreseeing the outcome of
events that have not yet occurred. In this sense, the authors
are not aware of any publications or claims that, using social
media data, someone was able to propose a method that would
predict correctly and consistently the results of elections before
the elections happened. What has happened, however, is that
on several occasions, post processing of social media data
has resulted in claims that they might had been able to make
correct electoral predictions. Such claims are discussed in the
following subsection.
B. Claims that Social Media Data could have predicted elec-
tions
Probably due to the promising results achieved by many of
the projects and studies discussed in the section I, there is a
relatively high amount of hype surrounding the feasibility of
predicting electoral results using social media. It must be noted
that most of that hype is fueled by traditional media and blogs,
usually bursting prior and after electoral events. For example,
shortly after the recent 2010 elections in the US, flamboyant
statements made it to the news media headlines. From those
arguing that Twitter is not a reliable predictor (e.g. [13]) to
those claiming just the opposite, that Twitter (and Facebook)
was remarkably accurate (e.g. [14]). Moreover, the degree of
accuracy of these “predictions” was usually assessed in terms
of percentage of correctly guessed electoral races e.g., the
winners of 74% for the US House and 81% for the US Senate
races were predicted [15] without further qualification. Such
qualifications are important since a few US races are won
by very tight margins, while most of them are won with
comfortable margins. These predictions were not compared
against traditional ways of prediction, such as professional
polling methods, or even trivial prediction methods based on
incumbency (the fact that those who are already in office are
far more likely to be re-elected in the US).
Compared to the media coverage, the number of scholarly
works on the feasibility of predicting popular opinion and
elections from social media is relatively small. Nevertheless,
it does tend to support a positive opinion on the predictive
power of social media as a promising line of research, while
exposing some caveats of the methods. Thus, according to
[16], the number of Facebook fans for election candidates had
a measurable influence on their respective vote shares. These
researchers assert that “social network support, on Facebook
specifically, constitutes an indicator of candidate viability of
significant importance [...] for both the general electorate and
even more so for the youngest age demographic.
A study of a different kind was conducted by [10]. They
analyzed the way in which simple sentiment analysis methods
could be applied to tweets as a tool of automatically pulsing
public opinion. These researchers correlated the output of such
a tool with the temporal evolution of different indices such as
the index of Consumer Sentiment, the index of Presidential
Job Approval, and several pre-electoral polls for the US 2008
Presidential Race. The correlation with the first two indices
was rather high but it was not significant for the pre-electoral
polls, and they conclude that sentiment analysis on Twitter data
seems to be a promising field of research to replace traditional
polls although, they find, it’s not quite there yet.
The work by [11] focuses directly on whether Twitter
can serve as a predictor of electoral results. In that paper,
a strong statement is made about predictability, namely that
“the mere number of tweets mentioning a political party can
be considered a plausible reflection of the vote share and
its predictive power even comes close to traditional election
polls. In fact, they report a mean average error (MAE) of only
1.65%. Moreover, these researchers found that co-occurrence
of political party mentions accurately reflected close political
positions between political parties and plausible coalitions.
More recently, [12] used the Tweets sent by the electoral
candidates, not the general public, and reported success in
“building a model that predicts whether a candidate will win or
lose with accuracy of 88.0%”. While this concluding statement
seems strong, a closer look in the claims reveals that they
found their model to be less successful, as they admit that
“applying this technique, we correctly predict 49 out of 63
(77.7%) of the races”.
C. Claims that Social Media Data did not predict the elections
The previous subsection reveals some inconsistencies with
electoral predictions in scholarly publications. While candi-
date counts of Twitter messages predicted with remarkable
accuracy electoral results in Germany in 2009 [11], a more
elaborated method did not correlate well with pre-electoral
polls in the US 2008 Presidential elections [10]. Could it be
that some of those results were just a matter of chance or the
side-effect of technical problems? Who is right?
The work by [17] focuses on the use of Google search
volume (not Twitter) as a predictor for the 2008 and 2010
US Congressional elections. They divided the electoral races
in groups depending on the degree they were contested by
the candidates, and they find that only a few groups of races
were “predicted” above chance using Google Trends in one
case achieving 81% of correct results. However, they report
that those promising results were achieved by chance: while
the best group’s predictions were good in 2008 (81%), for the
same group the predictions were very poor in 2010 (34%).
Importantly, even when the predictions were better than
chance, they were not competent compared to the trivial
method of predicting through incumbency. For example, in
2008, 91.6% of the races were won by incumbents. Even
in 2010, in elections with major public discontent, 84.5% of
the races, were won by incumbents. Given that, historically,
the incumbent candidate gets re-elected about 9 out of 10
times, the baseline for any competent predictor should be
the incumbent re-election rate. According to such a baseline,
Google search volume proves to be a poor electoral predic-
tor. Compared to professional pollsters (e.g., The New York
Times), the predictions were far worse; and, in some groups
of races the predictions were even worse than chance!

In [18], the sentiment analysis methods of [10] and [11] are
applied to tweets obtained during the US 2008 Presidential
elections (Obama vs. McCain). [18] assigned a voting inten-
tion to every individual user in the dataset, along with the
user’s geographical location. Thus, electoral predictions were
computed for different states instead of simply the whole of
the US, and found that every method examined would have
largely overestimated Obama’s victory, predicting (incorrectly)
that Obama would have won even in Texas. In addition, [18]
provides some suggestions on the way in which such data
could be filtered to improve prediction accuracy. In this sense,
it points out that demographic bias in the user base of Twitter
and other social media services is an important electoral factor
and, therefore, bias in data should be corrected according to
user demographic profiles.
Recently, [19] provided a thorough response to the work of
[11] arguing that those authors relied on a number of arbitrary
choices which make their method virtually useless for future
elections. They point out that, by taking into account all of the
parties running for the elections, the method by [11] would
actually have predicted a victory for the Piratenpartei (Pirate
Party) (which received 2% of the votes but no seats in the
German parliament).
In this paper we decided to examine closer the claims
of electoral predictions described in the previous subsection.
Since we had collected data Twitter data from the US Con-
gressional Elections in 2010, we were in a position to examine
whether the methods proposed were as successful in instances
other than the ones they were developed for. Moreover, we
wanted to analyze why would electoral predictions using social
media may (or may not) be possible. In the next section III
we describe our computational experiments and in section IV
we analyze the operating models behind electoral predictions.
III. NEW EXPERIMENTS ON TWITTER AND ELECTIONS
For our study, we used two data sets related to elections that
took place in the US during 2010. Predictions were calculated
based on Twitter chatter volume, as in [11], and then based on
sentiment analysis of tweets, in ways similar to [10]. While
we did not have comparable data to examine the methods of
[12], we discuss some of its findings in the next section.
The first data set we used belongs to the 2010 US Sen-
ate special election in Massachusetts (“MAsen10”), a highly
contested race between Martha Coakley (D) and Scott Brown
(R). The data set contains 234,697 tweets contributed by
56,165 different Twitter accounts, collected with the use of
Twitter streaming API, configured to retrieve near real-time
tweets containing the names of any of the two candidates. The
collection took place from January 13 to January 20, 2010, the
day after the elections.
The second data set contains all the tweets provided by
the Twitter “gardenhose” in the week from October 26 to
November 1, the day before the general US Congressional
elections in November 2, 2010 (“USsen10”). The gardenhose
provides a uniform sampling of the Twitter data. The daily
snapshots contained between 5.6 and 7.7 million tweets. Using
the names of candidates for five highly contested races for the
US Senate, 13,019 tweets were collected, contributed by 6,970
different Twitter accounts.
These two datasets are different. The MAsen10 is an almost
complete set of tweets, while USsen10 provides a random
sample, but because of its randomness, it should accurately
represent the volume and nature of tweets during that pre-
election week.
The first prediction method we examined is the one de-
scribed by [11], which consists of counting the number of
tweets mentioning each candidate. According to that study, the
proportion of tweets mentioning each candidate should closely
reflect the actual vote share in the election. Tweets containing
the names of both candidates were not included, focusing only
on tweets mentioning one candidate at a time.
The second prediction method extends the ideas from [10],
which described a way to compute a sentiment score for a
topic being discussed on Twitter. To that end, [10] relied
on the subjectivity lexicon collected by [20] and labeled
tweets containing any positive word as positive tweets, and
the ones containing any negative word as negative tweets.
Then, the sentiment score is defined to be the ratio between the
number of positive and negative tweets. It must be noted that,
according to [10], the number of polarized words in the tweet
is not important, and tweets can be simultaneously considered
as positive and negative. In addition, sentiment scores for
topics with very different volumes of tweets are not easily
comparable. Because of these issues, some changes had to be
made to [10]’s approach in order to compute predicted vote
shares. In our study, the lexicon employed is also [20], but
tweets are considered either positive or negative but not both.
Every tweet is labeled as positive, negative, or neutral, based
on the sum of such labeled words (positive words contribute
+1, while negative words contribute -1). A tweet might be
labeled neutral when the sum of polarized words is 0, or
when no contributing words appeared in it. Given the two-
party nature of the races, the vote share is calculated with this
formula:
vote
share(c
1
) =
pos(c
1
) + neg(c
2
)
pos(c
1
) + neg(c
1
) + pos(c
2
) + neg(c
2
)
(1)
where c
1
is the candidate for whom support is being com-
puted while c
2
is the opposing candidate; pos(c) and neg(c)
are, respectively, the number of positive and negative tweets
mentioning candidate c.
A. Results of Applying the Prediction Methods
For the MAsen10 data it was possible to make a more
detailed analysis, since the data contained tweets before the
election day (6 days of data), the election day (20 hours of
data), and post-election (10 hours of data). The 47,368 tweets
that mentioned both candidates were not used.
Table I shows the number of tweets mentioning each candi-
date and the election results predicted from the volume. The
total count of tweets we collected (53.25% - 46.75% in favor
of Brown) reflects closely the election outcome (Brown 51.9%
- Coakley 47.1%). Correct prediction?

Coakley Brown
#tweets % #tweets %
Pre-elec. (6 days) 52,116 53.86 44,654 46.14
Elec. day (20 hrs) 21,076 49.94 21,123 50.06
Post-elec. (10 hrs) 14,381 29.74 33,979 70.26
Total 87,573 46.75 99,756 53.25
TABLE I
THE SHARE OF TWEETS FOR EACH CANDIDATE IN THE MASEN10 DATA
SET. NOTICE THAT THE PRE-ELECTION SHARE DIDNT PREDICT THE FINAL
RESULT (BROWN WON 51.9% OF THE VOTES).
Coakley Brown
Pre-election 46.5% 53.5%
Election-day 44.25% 55.8%
Post-election 27.2% 72.8%
All 41.0% 59.0%
TABLE II
PREDICTIONS BASED ON VOTE SHARE FOR MASEN10 DATA SET BASED ON
SENTIMENT ANALYSIS. THE PRE-ELECTION PREDICTION CORRECTLY
PREDICTS BROWN AS THE WINNER WITH A SMALL ERROR (1.1% FOR
CORRECTED ELECTION RESULTS, ALSO SEE TABLE III).
We refrained from declaring victory in the predictive power
of Twitter when we realized that the share volume for the
pre-election period, actually predicted a win for Coakley,
not Brown. Table I also shows how the number of tweets
was affected by electoral events. Brown received 1/3 of all
his mentions in the 10 hours post-election, when everyone
started talking about his win, an important win that would
have repercussions for the health care reform, a major issue
at the time. Brown’s win broke the filibuster-proof power of
democrats in the US Senate and produced a lot of tweets.
While the simple Twitter share of pre-election tweets
couldn’t predict the result of the MAsen10 election, applying
sentiment analysis to tweets and calculating the vote share
with Equation (1), comes close to electoral results, as shown
in Table II. For a second time in our research effort we re-
frained from declaring victory in Twitter’s power in predicting
elections, and decided to take a closer look in our data.
The two prediction methods were further applied to 5 other
highly contested senate races from the USsen10 data set. The
results of the 6 races are summarized in Table III. The actual
results of the election don’t always sum up to 100% because
in a few races more than two candidates participated. So, in
order to calculate the mean average error (MAE), the results
were normalized to sum up to 100%. Using the values of
the corrected election results, MAE values were calculated
for both methods. The Twitter volume method had an error of
17.1%, while the sentiment analysis had an error of 7.6%. In
other words, both MAE values are unacceptably high. Each
method was able to correctly predict the winner in only half
of the races.
B. Sentiment Analysis Accuracy
The result in Table III show that while both prediction
methods are correct only half of the time, MAE is smaller for
POS NEG NEUT Accuracy
opposing Brown 124 76 150 21.71%
opposing Coakley 70 67 105 27.68%
supporting Brown 216 45 254 41.94%
supporting Coakley 240 72 213 45.71%
neutral 249 82 296 47.20%
36.85%
TABLE IV
CONFUSION MATRIX FOR THE EVALUATION OF THE AUTOMATIC
SENTIMENT ANALYSIS COMPUTED AGAINST A MANUALLY LABELED SET
OF TWEETS.
the sentiment analysis method. This difference was intriguing
and we decided to study it closer. While a thorough evaluation
of the accuracy of sentiment analysis regarding political con-
versation is out of the scope of this paper, some evidence on
the issues affecting simple methods based on polarity lexicons
is provided from three different angles:
1) Compared against manually labeled tweets: To evalu-
ate the accuracy of the above described sentiment analysis
method, a set of tweets were manually assigned to one of
the following labels: opposing Brown, opposing Coakley,
supporting Brown, supporting Coakley, or neutral. This set of
tweets was chosen to reflect “one tweet, one vote”: From the
set of Twitter users that had indicated their location in the state
of Massachusetts, we chose users with a single tweet in the
corpus. This set contains 2,259 tweets. We read the tweets and
manually assigned labels to them. Our labels were compared
against those assigned by the automatic method, producing the
confusion matrix in Table IV.
The results show that the accuracy of the sentiment analysis
is only 36.85%, slightly better than a classifier randomly as-
signing the same three labels (positive, negative, and neutral).
2) Effect of misleading propaganda: A second evaluation
was performed on a particular set of tweets, namely those in-
cluded in a “Twitter bomb” targeted at Coakley [21] containing
a series of tweets spreading misleading information about her.
The corpus used in this study contained 925 tweets that were
part of such the Twitter bomb. According to the automatic
sentiment analysis, 369 of them were positive messages, 212
were neutral, and only 344 were negative. While all of these
tweets were part of an orchestrated smearing campaign against
Coakley, most of them were characterized as neutral or even
positive by the automatic sentiment analysis.
Therefore, we conclude that by just relying on polarity
lexicons the subtleties of propaganda and disinformation are
not only missed but even wrongly interpreted.
3) Relation to presumed political leaning: Finally, an ad-
ditional experiment was conducted to test the assumption
underlying this application of sentiment analysis, namely, that
the political preference of users can be derived from their
tweets. To derive the political preference from the tweets, for
every user, the corresponding tweets were grouped together
and their accumulated polarity score was attributed to the user.
The presumed political orientation of a user was calculated
following the approach described by [22]. This approach

State Senate Race Election Result Normalized Result Twitter Volume Sentiment Analysis
MA Coakley [D] vs. Brown[R] 47.1% - 51.9% 47.6% - 52.4% 53.9% - 46.1% 46.5% - 53.5%
CO Bennet [D] vs Buck [R] 48.1% - 46.4% 50.9% - 49.1% 26.3% - 73.7% 63.3% - 36.7%
NV Reid [D] vs Angle [R] 50.3% - 44.5% 53.1% - 46.9% 51.2% - 48.8% 48.4% - 51.6%
CA Boxer [D] vs Fiorina [R] 52.2% - 44.2% 54.1% - 45.9% 57.9% - 42.1% 47.8% - 52.2%
KY Conway [D] vs Paul [R] 44.3% - 55.7% 44.3% - 55.7% 4.7% - 95.3% 43.1% - 56.9%
DE Coons [D] vs O’Donnell [R] 56.6% - 40.0% 58.6% - 41.4% 32.1% - 67.9% 38.8% - 61.2%
TABLE III
THE SUMMARY OF ELECTORAL AND PREDICTED RESULTS FOR 6 HIGHLY CONTESTED SENATE RACES. NUMBERS IN BOLD SHOW RACES WHERE THE
WINNER WAS PREDICTED CORRECTLY BY THE TECHNIQUE. BOTH TWITTER VOLUME AND SENTIMENT ANALYSIS METHODS WERE ABLE TO PREDICT
CORRECTLY 50% OF THE RACES. IN THIS SAMPLE, INCUMBENTS WON IN ALL THE RACES THEY RUN (NV, CA, CO), AND 84.5% OF ALL 2010 RACES.
makes use of the ADA scores, which range from 0 (most
conservative) to 100 (most liberal). ADA (Americans for
Democratic Action) is a liberal, political think-tank that pub-
lishes scores for each member of the US Congress according
to their voting record in key progressive issues. Official Twitter
accounts for 210 members of the House and 68 members of the
Senate were collected. Then, the Twitter followers of all these
accounts were collected, and every user received the average
ADA score of the Congress members it was following. The
number of Twitter users following the above mentioned 278
Congress members is roughly half a million. A little more than
14 thousand of them also appear in the MAsen10 dataset, and
they are used in the following correlation analysis.
For each of these 14 thousand users four different scores
are computed: their ADA score which, purportedly, would
reflect their political leaning, their opinion on Brown, their
opinion on Coakley, and their “voting orientation” for this
particular election. The voting orientation is defined as the
result of subtracting the opinion on Coakley from the opinion
on Brown. Given the range of the ADA scores and the sign
of the rest of the scores, the correlations between them should
be as follows. The correlation between ADA score and the
opinion on Brown should be negative; after all, republicans
(closer to 0 in the ADA scale) should value Brown positively,
and democrats (closer to 100) should value him negatively.
The opposite should be true for Coakley and, thus, a positive
correlation should be expected. With regards to the ADA
score and the voting orientation they should also be negatively
correlated for the same reasons as ADA score vs opinion on
Brown.
Table V shows the results of this experiment. The different
scores do correlate as expected. However, the correlations are
very weak, showing that they are essentially orthogonal with
each other.
Based on these three experiments, we claim that the ac-
curacy of lexicon-based sentiment analysis when applied to
political conversation is quite poor. When compared against
manually labeled tweets it seems to just slightly outperform
a random classifier; it fails to detect and correctly assign
the intent behind disinformation and misleading propaganda;
and, finally, it’s a far cry from being able to predict political
preference.
Pearson’s r
Opinion on Brown vs Avg. ADA scores -0.150799848
Opinion on Coakley vs Avg. ADA scores +0.09304417
Voting orientation vs Avg. ADA scores -0.178902764
TABLE V
CORRELATION BETWEEN AVERAGED ADA SCORES (WHICH
PURPORTEDLY REFLECT USERS POLITICAL PREFERENCE) AND THE
OPINIONS ON THE TWO CANDIDATES AND THE VOTING ORIENTATION. THE
CORRELATIONS FOUND ARE CONSISTENT WITH THE INITIAL HYPOTHESES
BUT VERY WEAK TO BE USEFUL.
C. Could we had done better than that?
The previous subsection reviewed how the methods pro-
posed to predict elections would have performed in several
instances using data from the 2010 US Congressional elec-
tions. These experiments were important because a wider set
of test cases was needed to base any claims of predictability
of elections through Social Media.
Given the unsuccessful predictions we report, one might
counter that “you would have done better if you did a different
kind of analysis”. However, recall that we did not try to invent
new techniques of analysis: We simply tried to repeat the
(reportedly successful) methods that others have used in the
past, and we found that the results were not repeatable.
IV. HOW TO PREDICT ELECTIONS
In the past, some research efforts have treated social media
as a black box: it may give you the right answer, though you
may not know why. We believe that there is an opportunity for
intellectual contribution if research methods are accompanied
with at least a basic reasonable model on why they would
predict correctly. Next we discuss some standards that electoral
predictions should obey in order to be repeatedly successful.
A. A method of prediction should be an algorithm.
This might seem as a trivial point, but it is not always
easy to follow when dealing with social media. Of course,
every election might seem different and adjustments in the data
collection and analysis may be necessary. Nevertheless, these
adjustments should be determinable before hand, because, as
Duncan Watts [23] argues in his recent book, they all seem
obvious afterwards.
More specifically, we propose that a method should clearly
describe before the elections: (a) the way in which the Social

Citations
More filters
Journal ArticleDOI
14 Mar 2014-Science
TL;DR: Large errors in flu prediction were largely avoidable, which offers lessons for the use of big data.
Abstract: In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. Nature reported that GFT was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States ( 1 , 2 ). This happened despite the fact that GFT was built to predict CDC reports. Given that GFT is often held up as an exemplary use of big data ( 3 , 4 ), what lessons can we draw from this error?

2,062 citations

Journal ArticleDOI
TL;DR: This article developed a Bayesian Spatial Following model that scales Twitter users along a common ideological dimension based on who they follow, and applied this network-based method to estimate ideal points for Twitter users in the US, the UK, Spain, Italy, and the Netherlands.
Abstract: Parties, candidates, and voters are becoming increasingly engaged in political conversations through the micro-blogging platform Twitter. In this paper I show that the structure of the social networks in which they are embedded has the potential to become a source of information about policy positions. Under the assumption that social networks are homophilic (McPherson et al., 2001), this is, the propensity of users to cluster along partisan lines, I develop a Bayesian Spatial Following model that scales Twitter users along a common ideological dimension based on who they follow. I apply this network-based method to estimate ideal points for Twitter users in the US, the UK, Spain, Italy, and the Netherlands. The resulting positions of the party accounts on Twitter are highly correlated with oine measures based on their voting records and their manifestos. Similarly, this method is able to successfully classify individuals who state their political orientation publicly, and a sample of users from the state of Ohio whose Twitter accounts are matched with their voter registration history. To illustrate the potential contribution of these estimates, I examine the extent to which online behavior is polarized along ideological lines. Using the 2012 US presidential election campaign as a case study, I nd that public exchanges on Twitter take place predominantly among users with similar viewpoints.

633 citations


Cites background from "How (Not) to Predict Elections"

  • ...Critics (Metaxas et al., 2011; Gayo-Avello, 2012) have respondent that the “predictive power of Twitter regarding elections has been greatly exaggerated”: most of these electoral predictions do not perform better than mere chance....

    [...]

Journal ArticleDOI
TL;DR: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research and it is believed that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community.
Abstract: Background: As more and more researchers are turning to big data for new opportunities of biomedical discoveries, machine learning models, as the backbone of big data analysis, are mentioned more often in biomedical journals. However, owing to the inherent complexity of machine learning methods, they are prone to misuse. Because of the flexibility in specifying machine learning models, the results are often insufficiently reported in research articles, hindering reliable assessment of model validity and consistent interpretation of model outputs. Objective: To attain a set of guidelines on the use of machine learning predictive models within clinical settings to make sure the models are correctly applied and sufficiently reported so that true discoveries can be distinguished from random coincidence. Methods: A multidisciplinary panel of machine learning experts, clinicians, and traditional statisticians were interviewed, using an iterative process in accordance with the Delphi method. Results: The process produced a set of guidelines that consists of (1) a list of reporting items to be included in a research article and (2) a set of practical sequential steps for developing predictive models. Conclusions: A set of guidelines was generated to enable correct application of machine learning models and consistent reporting of model specifications and results in biomedical research. We believe that such guidelines will accelerate the adoption of big data analysis, particularly with machine learning methods, in the biomedical research community. [J Med Internet Res 2016;18(12):e323]

533 citations


Cites background from "How (Not) to Predict Elections"

  • ...It is not rare to see potentially spurious conclusions drawn from methodologically inadequate studies [7-11], which in turn compromises the credibility of other valid studies and discourages many researchers who could benefit from adopting machine learning techniques....

    [...]

Journal ArticleDOI
TL;DR: It is revealed that its presumed predictive power regarding electoral prediction has been somewhat exaggerated and further work on this topic is required, along with tighter integration with traditional electoral forecasting research.
Abstract: Electoral prediction from Twitter data is an appealing research topic. It seems relatively straightforward and the prevailing view is overly optimistic. This is problematic because while simple approaches are assumed to be good enough, core problems are not addressed. Thus, this article aims to (1) provide a balanced and critical review of the state of the art; (2) cast light on the presume predictive power of Twitter data; and (3) propose some considerations to push forward the field. Hence, a scheme to characterize Twitter prediction methods is proposed. It covers every aspect from data collection to performance evaluation, through data processing and vote inference. Using that scheme, prior research is analyzed and organized to explain the main approaches taken up to date but also their weaknesses. This is the first meta-analysis of the whole body of research regarding electoral prediction from Twitter data. It reveals that its presumed predictive power regarding electoral prediction has been somewhat exaggerated: Social media may provide a glimpse on electoral outcomes but, up to now, research has not provided strong evidence to support it can currently replace traditional polls. Nevertheless, there are some reasons for optimism and, hence, further work on this topic is required, along with tighter integration with traditional electoral forecasting research.

283 citations

Journal ArticleDOI
TL;DR: The authors analyzed the structure and content of the political conversations that took place through the microblogging platform Twitter in the context of the 2011 Spanish legislative elections and the 2012 U.S. presidential elections and found that Twitter replicates most of the existing inequalities in public political exchanges.
Abstract: In this article, we analyze the structure and content of the political conversations that took place through the microblogging platform Twitter in the context of the 2011 Spanish legislative elections and the 2012 U.S. presidential elections. Using a unique database of nearly 70 million tweets collected during both election campaigns, we find that Twitter replicates most of the existing inequalities in public political exchanges. Twitter users who write about politics tend to be male, to live in urban areas, and to have extreme ideological preferences. Our results have important implications for future research on the relationship between social media and politics, since they highlight the need to correct for potential biases derived from these sources of inequality.

274 citations

References
More filters
Journal ArticleDOI
TL;DR: This work investigates whether measurements of collective mood states derived from large-scale Twitter feeds are correlated to the value of the Dow Jones Industrial Average (DJIA) over time and indicates that the accuracy of DJIA predictions can be significantly improved by the inclusion of specific public mood dimensions but not others.

4,453 citations


"How (Not) to Predict Elections" refers background in this paper

  • ...In fact, to describe this phenomenon, [2] talk about “predicting the future” while [3] have coined the term “predicting the present”....

    [...]

Journal ArticleDOI
19 Feb 2009-Nature
TL;DR: A method of analysing large numbers of Google search queries to track influenza-like illness in a population and accurately estimate the current level of weekly influenza activity in each region of the United States with a reporting lag of about one day is presented.
Abstract: This paper - first published on-line in November 2008 - draws on data from an early version of the Google Flu Trends search engine to estimate the levels of flu in a population. It introduces a computational model that converts raw search query data into a region-by-region real-time surveillance system that accurately estimates influenza activity with a lag of about one day - one to two weeks faster than the conventional reports published by the Centers for Disease Prevention and Control. This report introduces a computational model based on internet search queries for real-time surveillance of influenza-like illness (ILI), which reproduces the patterns observed in ILI data from the Centers for Disease Control and Prevention. Seasonal influenza epidemics are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year1. In addition to seasonal influenza, a new strain of influenza virus against which no previous immunity exists and that demonstrates human-to-human transmission could result in a pandemic with millions of fatalities2. Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza3,4. One way to improve early detection is to monitor health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day. Here we present a method of analysing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to use search queries to detect influenza epidemics in areas with a large population of web search users.

3,984 citations


"How (Not) to Predict Elections" refers background in this paper

  • ...Empowered by the APIs that many social media companies make available, researchers are engaged in an effort to analyze and make sense of the data collected through these social communication channels....

    [...]

Proceedings ArticleDOI
06 Oct 2005
TL;DR: A new approach to phrase-level sentiment analysis is presented that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions.
Abstract: This paper presents a new approach to phrase-level sentiment analysis that first determines whether an expression is neutral or polar and then disambiguates the polarity of the polar expressions. With this approach, the system is able to automatically identify the contextual polarity for a large subset of sentiment expressions, achieving results that are significantly better than baseline.

3,433 citations


"How (Not) to Predict Elections" refers background in this paper

  • ...…with this formula: vote share(c1) = pos(c1) + neg(c2) pos(c1) + neg(c1) + pos(c2) + neg(c2) (1) where c1 is the candidate for whom support is being computed while c2 is the opposing candidate; pos(c) and neg(c) are, respectively, the number of positive and negative tweets mentioning candidate c....

    [...]

Proceedings Article
16 May 2010
TL;DR: It is found that the mere number of messages mentioning a party reflects the election result, and joint mentions of two parties are in line with real world political ties and coalitions.
Abstract: Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment Using LIWC text analysis software, we conducted a content-analysis of over 100,000 messages containing a reference to either a political party or a politician Our results show that Twitter is indeed used extensively for political deliberation We find that the mere number of messages mentioning a party reflects the election result Moreover, joint mentions of two parties are in line with real world political ties and coalitions An analysis of the tweets’ political sentiment demonstrates close correspondence to the parties' and politicians’ political positions indicating that the content of Twitter messages plausibly reflects the offline political landscape We discuss the use of microblogging message content as a valid indicator of political sentiment and derive suggestions for further research

2,718 citations


"How (Not) to Predict Elections" refers background or methods in this paper

  • ...Predictions were calculated based on Twitter chatter volume, as in [11], and then based on sentiment analysis of tweets, in ways similar to [10]....

    [...]

  • ...One would expect that, following the previous research literature (e.g. [11], [12]), and given the high utilization that the Web and online social networks have in the US [1], Twitter volume should be have been able to predict consistently the outcomes of the US Congressional elections....

    [...]

  • ...Recently, [19] provided a thorough response to the work of [11] arguing that those authors relied on a number of arbitrary choices which make their method virtually useless for future elections....

    [...]

  • ...According to that study, the proportion of tweets mentioning each candidate should closely reflect the actual vote share in the election....

    [...]

  • ...…with this formula: vote share(c1) = pos(c1) + neg(c2) pos(c1) + neg(c1) + pos(c2) + neg(c2) (1) where c1 is the candidate for whom support is being computed while c2 is the opposing candidate; pos(c) and neg(c) are, respectively, the number of positive and negative tweets mentioning candidate c....

    [...]

Proceedings Article
16 May 2010
TL;DR: This work connects measures of public opinion measured from polls with sentiment measured from text, and finds several surveys on consumer confidence and political opinion over the 2008 to 2009 period correlate to sentiment word frequencies in contemporaneous Twitter messages.
Abstract: We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contemporaneous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The results highlight the potential of text streams as a substitute and supplement for traditional polling.

1,940 citations


"How (Not) to Predict Elections" refers background or methods or result in this paper

  • ...” A study of a different kind was conducted by [10]....

    [...]

  • ...It must be noted that, according to [10], the number of polarized words in the tweet is not important, and tweets can be simultaneously considered as positive and negative....

    [...]

  • ...In [18], the sentiment analysis methods of [10] and [11] are applied to tweets obtained during the US 2008 Presidential...

    [...]

  • ...To that end, [10] relied on the subjectivity lexicon collected by [20] and labeled tweets containing any positive word as positive tweets, and the ones containing any negative word as negative tweets....

    [...]

  • ...While candidate counts of Twitter messages predicted with remarkable accuracy electoral results in Germany in 2009 [11], a more elaborated method did not correlate well with pre-electoral polls in the US 2008 Presidential elections [10]....

    [...]

Frequently Asked Questions (10)
Q1. What are the contributions mentioned in the paper "How (not) to predict elections" ?

Arguably, one of the most interesting aspects of this trend is the possibility of “ pulsing ” the public ’ s opinion in near real-time and, thus, it has attracted the interest of many researchers as well as news organizations. This work aims to test the predictive power of social media metrics against several Senate races of the two recent US Congressional elections. The authors review the findings of other researchers and they try to duplicate their findings both in terms of data volume and sentiment analysis. In this paper, the authors offer two conclusions and a proposal: First, they find that electoral predictions using the published research methods on Twitter data are not better than chance. The authors propose a set of standards that any theory aiming to predict elections ( or other social events ) using social media should follow. 

In addition to that, further research is needed regarding the flaws of simple sentiment analysis methods when applied to political conversation. In this sense it would be very interesting to understand the impact of different lexicons and to go one step further by using machine learning techniques ( such as in the work by [ 2 ] ). Finally, the authors point out that their results do not argue against having a strategy for involving social media in a candidate ’ s election campaign. 

Given that, historically, the incumbent candidate gets re-elected about 9 out of 10 times, the baseline for any competent predictor should be the incumbent re-election rate. 

It must be noted that, according to [10], the number of polarized words in the tweet is not important, and tweets can be simultaneously considered as positive and negative. 

A little more than 14 thousand of them also appear in the MAsen10 dataset, and they are used in the following correlation analysis. 

Every tweet is labeled as positive, negative, or neutral, based on the sum of such labeled words (positive words contribute +1, while negative words contribute -1). 

2) Effect of misleading propaganda: A second evaluation was performed on a particular set of tweets, namely those included in a “Twitter bomb” targeted at Coakley [21] containing a series of tweets spreading misleading information about her. 

Using on social media data the same analytical tools as one would use on data from natural phenomena may not result in repeatable predictions. 

Probably due to the promising results achieved by many of the projects and studies discussed in the section I, there is a relatively high amount of hype surrounding the feasibility of predicting electoral results using social media. 

The first prediction method the authors examined is the one described by [11], which consists of counting the number of tweets mentioning each candidate.