scispace - formally typeset
Search or ask a question
Proceedings Article

Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

TL;DR: It is found that the mere number of messages mentioning a party reflects the election result, and joint mentions of two parties are in line with real world political ties and coalitions.
Abstract: Twitter is a microblogging website where users read and write millions of short messages on a variety of topics every day This study uses the context of the German federal election to investigate whether Twitter is used as a forum for political deliberation and whether online messages on Twitter validly mirror offline political sentiment Using LIWC text analysis software, we conducted a content-analysis of over 100,000 messages containing a reference to either a political party or a politician Our results show that Twitter is indeed used extensively for political deliberation We find that the mere number of messages mentioning a party reflects the election result Moreover, joint mentions of two parties are in line with real world political ties and coalitions An analysis of the tweets’ political sentiment demonstrates close correspondence to the parties' and politicians’ political positions indicating that the content of Twitter messages plausibly reflects the offline political landscape We discuss the use of microblogging message content as a valid indicator of political sentiment and derive suggestions for further research

Content maybe subject to copyright    Report

Citations
More filters
Book
01 May 2012
TL;DR: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language as discussed by the authors and is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining.
Abstract: Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language. It is one of the most active research areas in natural language processing and is also widely studied in data mining, Web mining, and text mining. In fact, this research has spread outside of computer science to the management sciences and social sciences due to its importance to business and society as a whole. The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, blogs, micro-blogs, Twitter, and social networks. For the first time in human history, we now have a huge volume of opinionated data recorded in digital form for analysis. Sentiment analysis systems are being applied in almost every business and social domain because opinions are central to almost all human activities and are key influencers of our behaviors. Our beliefs and perceptions of reality, and the choices we make, are largely conditioned on how others see and evaluate the world. For this reason, when we need to make a decision we often seek out the opinions of others. This is true not only for individuals but also for organizations. This book is a comprehensive introductory and survey text. It covers all important topics and the latest developments in the field with over 400 references. It is suitable for students, researchers and practitioners who are interested in social media analysis in general and sentiment analysis in particular. Lecturers can readily use it in class for courses on natural language processing, social media analysis, text mining, and data mining. Lecture slides are also available online.

4,515 citations

Proceedings Article
16 May 2014
TL;DR: Interestingly, using the authors' parsimonious rule-based model to assess the sentiment of tweets, it is found that VADER outperforms individual human raters, and generalizes more favorably across contexts than any of their benchmarks.
Abstract: The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.

3,299 citations

Proceedings ArticleDOI
05 Jul 2011
TL;DR: It is demonstrated that the network of political retweets exhibits a highly segregated partisan structure, with extremely limited connectivity between left- and right-leaning users, and surprisingly this is not the case for the user-to-user mention network, which is dominated by a single politically heterogeneous cluster of users.
Abstract: In this study we investigate how social media shape the networked public sphere and facilitate communication between communities with different political orientations. We examine two networks of political communication on Twitter, comprised of more than 250,000 tweets from the six weeks leading up to the 2010 U.S. congressional midterm elections. Using a combination of network clustering algorithms and manually-annotated data we demonstrate that the network of political retweets exhibits a highly segregated partisan structure, with extremely limited connectivity between left- and right-leaning users. Surprisingly this is not the case for the user-to-user mention network, which is dominated by a single politically heterogeneous cluster of users in which ideologically-opposed individuals interact at a much higher rate compared to the network of retweets. To explain the distinct topologies of the retweet and mention networks we conjecture that politically motivated individuals provoke interaction by injecting partisan content into information streams whose primary audience consists of ideologically-opposed users. We conclude with statistical evidence in support of this hypothesis.

1,379 citations


Cites background from "Predicting Elections with Twitter: ..."

  • ...Social media play an important role in shaping political discourse in the U.S. and around the world (Bennett 2003; Benkler 2006; Sunstein 2007; Farrell and Drezner 2008; Aday et al. 2010; Tumasjan et al. 2010; O’Connor et al. 2010)....

    [...]

Proceedings Article
05 Jul 2011
TL;DR: This paper evaluates the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging, and uses existing hashtags in the Twitter data for building training data.
Abstract: In this paper, we investigate the utility of linguistic features for detecting the sentiment of Twitter messages. We evaluate the usefulness of existing lexical resources as well as features that capture information about the informal and creative language used in microblogging. We take a supervied approach to the problem, but leverage existing hashtags in the Twitter data for building training data.

1,261 citations


Cites background from "Predicting Elections with Twitter: ..."

  • ...Just in the past year there have been a number of papers looking at Twitter sentiment and buzz (Jansen et al. 2009; Pak and Paroubek 2010; O’Connor et al. 2010; Tumasjan et al. 2010; Bifet and Frank 2010; Barbosa and Feng 2010; Davidov, Tsur, and Rappoport 2010)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The Linguistic Inquiry and Word Count (LIWC) system as discussed by the authors is a text analysis system that counts words in psychologically meaningful categories to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles and individual differences.
Abstract: We are in the midst of a technological revolution whereby, for the first time, researchers can link daily word use to a broad array of real-world behaviors. This article reviews several computerized text analysis methods and describes how Linguistic Inquiry and Word Count (LIWC) was created and validated. LIWC is a transparent text analysis program that counts words in psychologically meaningful categories. Empirical results using LIWC demonstrate its ability to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles, and individual differences.

4,356 citations

Journal ArticleDOI
19 Feb 2009-Nature
TL;DR: A method of analysing large numbers of Google search queries to track influenza-like illness in a population and accurately estimate the current level of weekly influenza activity in each region of the United States with a reporting lag of about one day is presented.
Abstract: This paper - first published on-line in November 2008 - draws on data from an early version of the Google Flu Trends search engine to estimate the levels of flu in a population. It introduces a computational model that converts raw search query data into a region-by-region real-time surveillance system that accurately estimates influenza activity with a lag of about one day - one to two weeks faster than the conventional reports published by the Centers for Disease Prevention and Control. This report introduces a computational model based on internet search queries for real-time surveillance of influenza-like illness (ILI), which reproduces the patterns observed in ILI data from the Centers for Disease Control and Prevention. Seasonal influenza epidemics are a major public health concern, causing tens of millions of respiratory illnesses and 250,000 to 500,000 deaths worldwide each year1. In addition to seasonal influenza, a new strain of influenza virus against which no previous immunity exists and that demonstrates human-to-human transmission could result in a pandemic with millions of fatalities2. Early detection of disease activity, when followed by a rapid response, can reduce the impact of both seasonal and pandemic influenza3,4. One way to improve early detection is to monitor health-seeking behaviour in the form of queries to online search engines, which are submitted by millions of users around the world each day. Here we present a method of analysing large numbers of Google search queries to track influenza-like illness in a population. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to use search queries to detect influenza epidemics in areas with a large population of web search users.

3,984 citations


"Predicting Elections with Twitter: ..." refers background in this paper

  • ...Be it the tracking of influenza or consumer prices based on search terms (Choi & Varian, 2009; Ginsberg et al., 2009), the inference of relationships between people based on their phone records (Eagle, Pentland, & Lazer, 2009), or the structure of TV events or the box office of movies based on the real-time comments on Twitter (Asur & Huberman, 2010; Shamma, Kennedy, & Churchill, 2010)....

    [...]

  • ...Be it the tracking of influenza or consumer prices based on search terms (Choi & Varian, 2009; Ginsberg et al., 2009), the inference of relationships between people based on their phone records (Eagle, Pentland, & Lazer, 2009), or the structure of TV events or the box office of movies based on the…...

    [...]

Proceedings ArticleDOI
12 Aug 2007
TL;DR: It is found that people use microblogging to talk about their daily activities and to seek or share information and the user intentions associated at a community level are analyzed to show how users with similar intentions connect with each other.
Abstract: Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. Twitter, a popular microblogging tool has seen a lot of growth since it launched in October, 2006. In this paper, we present our observations of the microblogging phenomena by studying the topological and geographical properties of Twitter's social network. We find that people use microblogging to talk about their daily activities and to seek or share information. Finally, we analyze the user intentions associated at a community level and show how users with similar intentions connect with each other.

3,025 citations


"Predicting Elections with Twitter: ..." refers background in this paper

  • ...One stream of research concentrates on understanding microblogging usage and community structures (e.g., Honeycutt and Herring 2009; Huberman, Romero, and Wu 2008; Java et al. 2007)....

    [...]

Proceedings ArticleDOI
21 Aug 2005
TL;DR: Differences in the behavior of liberal and conservative blogs are found, with conservative blogs linking to each other more frequently and in a denser pattern.
Abstract: In this paper, we study the linking patterns and discussion topics of political bloggers. Our aim is to measure the degree of interaction between liberal and conservative blogs, and to uncover any differences in the structure of the two communities. Specifically, we analyze the posts of 40 "A-list" blogs over the period of two months preceding the U.S. Presidential Election of 2004, to study how often they referred to one another and to quantify the overlap in the topics they discussed, both within the liberal and conservative communities, and also across communities. We also study a single day snapshot of over 1,000 political blogs. This snapshot captures blogrolls (the list of links to other blogs frequently found in sidebars), and presents a more static picture of a broader blogosphere. Most significantly, we find differences in the behavior of liberal and conservative blogs, with conservative blogs linking to each other more frequently and in a denser pattern.

2,800 citations

Proceedings ArticleDOI
31 Aug 2010
TL;DR: It is shown that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors and improve the forecasting power of social media.
Abstract: In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that is generated from these websites remains largely untapped. In this paper, we demonstrate how social media content can be used to predict real-world outcomes. In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies. We show that a simple model built from the rate at which tweets are created about particular topics can outperform market-based predictors. We further demonstrate how sentiments extracted from Twitter can be utilized to improve the forecasting power of social media.

1,909 citations


"Predicting Elections with Twitter: ..." refers background in this paper

  • ...…Ginsberg et al., 2009), the inference of relationships between people based on their phone records (Eagle, Pentland, & Lazer, 2009), or the structure of TV events or the box office of movies based on the real-time comments on Twitter (Asur & Huberman, 2010; Shamma, Kennedy, & Churchill, 2010)....

    [...]