Predicting the Political Alignment of Twitter Users

doi:10.1109/PASSAT/SOCIALCOM.2011.34

Michael D. Conover, Bruno Gonc¸alves, Jacob Ratkiewicz, Alessandro Flammini and Filippo Menczer

Center for Complex Networks and Systems Research

School of Informatics and Computing

Indiana University, Bloomington

Abstract—The widespread adoption of social media for po-

litical communication creates unprecedented opportunities to

monitor the opinions of large numbers of politically active

individuals in real time. However, without a way to distinguish

between users of opposing political alignments, conﬂicting signals

at the individual level may, in the aggregate, obscure partisan

differences in opinion that are important to political strategy.

In this article we describe several methods for predicting the

political alignment of Twitter users based on the content and

structure of their political communication in the run-up to the

2010 U.S. midterm elections. Using a data set of 1,000 manually-

annotated individuals, we ﬁnd that a support vector machine

(SVM) trained on hashtag metadata outperforms an SVM trained

on the full text of users’ tweets, yielding predictions of political

afﬁliations with 91% accuracy. Applying latent semantic analysis

to the content of users’ tweets we identify hidden structure in the

data strongly associated with political afﬁliation, but do not ﬁnd

that topic detection improves prediction performance. All of these

content-based methods are outperformed by a classiﬁer based

on the segregated community structure of political information

diffusion networks (95% accuracy). We conclude with a practical

application of this machinery to web-based political advertising,

and outline several approaches to public opinion monitoring

based on the techniques developed herein.

I. INTRODUCTION

Political advertising expenditures are steadily increasing [1],

and are estimated to have reached four billion US dollars

during the 2010 U.S. congressional midterm elections [2]. The

recent ‘Citizens United’ Supreme Court ruling, which removed

restrictions on corporate spending in political campaigns, is

likely to accelerate this trend. As a result, political campaigns

are placing more emphasis on social media tools as a low-cost

platform for connecting with voters and promoting engage-

ment among users in their political base.

This trend is also fueled in part by the fact that voters

are increasingly engaging with the political process online.

According to the Pew Internet and American Life Project,

fully 73% of adult internet users went online to get news

or information about politics in 2010, with more than one

in ﬁve adults (22%) using Twitter or social networking sites

for political purposes [3].

A popular microblogging platform with almost 200 million

users [4], Twitter is an outlet for up-to-the-minute status up-

dates, allowing campaigns, candidates and citizens to respond

in real-time to news and political events. From the perspective

of political mobilization, Twitter creates opportunities for viral

marketing efforts that can be leveraged to reach audiences

whose size is disproportionately large relative to the initial

investment.

Of particular interest to political campaigns is how the scale

of the Twitter platform creates the potential to monitor political

opinions in real time. For example, imagine a campaign

interested in tracking voter opinion relating to a speciﬁc piece

of legislation. One could easily envision applying sentiment

analysis tools to the set of tweets containing keyword relating

to the bill. However, without the ability to distinguish between

users with different political afﬁliations, aggregation over

conﬂicting partisan signals would likely obscure the nuances

most relevant to political strategy.

Here we explore several different approaches to the problem

of discriminating between users with left- and right-leaning

political alignment using manually annotated training data

covering nearly 1,000 Twitter users actively engaged in the

discussion of U.S. politics. Considering content based fea-

tures ﬁrst, we show that a support vector machine trained

on user-generated metadata achieves 91% overall accuracy

when tasked with predicting whether a user’s tweets express

a ‘left’ or ‘right’ political alignment. Using latent semantic

analysis we identify hidden sources of structural variation

in user-generated metadata that are strongly associated with

individuals’ political alignment.

Taking an interaction based perspective on political com-

munication, we use network clustering algorithms to extract

information about the individuals with whom each user com-

municates, and show that these topological properties can be

used to improve classiﬁcation accuracy even further. Speciﬁ-

cally, we ﬁnd that the community structure of the network of

political retweets can be used to predict the political alignment

of users with 95% accuracy.

We conclude with a proof of concept application based

on these classiﬁcations, identifying the websites most fre-

quently tweeted by left- and right-leaning users. We show

that domain popularity among politically active Twitter users

is not strongly correlated with overall trafﬁc to a site, a

ﬁnding that could allow campaigns to increase returns on

advertising investments by targeting lower-trafﬁc sites that are

very popular among politically active social media users.

II. BACKGROUND

A. The Twitter Platform

Twitter is a popular social networking and microblogging

site where users can broadcast short messages called ‘tweets’

to a global audience. A key feature of this platform is that,

by default, each user’s stream of real-time posts is public.

This fact, combined with its substantial population of users,

TABLE I

HASHTAGS RELATED TO #p2, #tcot, OR BOTH. TWEETS CONTAINING

ANY OF THESE HASHTAGS WERE INCLUDED IN OUR SAMPLE.

Just #p2 #casen #dadt #dc10210 #democrats #du1

#fem2 #gotv #kysen #lgf #ofa #onenation

#p2b #pledge #rebelleft #truthout #vote

#vote2010 #whyimvotingdemocrat #youcut

Both #cspj #dem #dems #desen #gop #hcr #nvsen

#obama #ocra #p2 #p21 #phnm #politics

#sgp #tcot #teaparty #tlot #topprog #tpp

#twisters #votedem

Just #tcot #912 #ampat #ftrs #glennbeck #hhrs

#iamthemob #ma04 #mapoli #palin #palin12

#spwbt #tsot #tweetcongress #ucot

#wethepeople

TABLE II

HASHTAGS EXCLUDED FROM THE ANALYSIS DUE TO AMBIGUOUS OR

OVERLY BROAD MEANING.

Excl. from #p2 #economy #gay #glbt #us #wc #lgbt

Excl. from both #israel #rs

Excl. from #tcot #news #qsn #politicalhumor

renders Twitter an extremely valuable resource for commercial

and political data mining and research applications.

One of Twitter’s deﬁning features is that each message

is limited to 140 characters. In response to these space

constraints, Twitter users have developed metadata annota-

tion schemes which, as we demonstrate, compress substantial

amounts of information into a comparatively tiny space. ‘Hash-

tags,’ the metadata feature on which we focus in this paper, are

short tokens used to indicate the topic or intended audience

of a tweet [5]; for example, #dadt for ‘Don’t Ask Don’t

Tell’ or #jlot for ‘Jewish Libertarians on Twitter.’ Originally

an informal practice, Twitter has integrated hashtags into the

core architecture of the service, allowing users to search for

these terms explicitly to retrieve a list of recent tweets about

a speciﬁc topic.

In addition to broadcasting tweets to an audience of fol-

lowers, Twitter users interact with one another primarily in

two public ways: retweets and mentions. Retweets act as

a form of endorsement, allowing individuals to rebroadcast

content generated by other users, thus raising the content’s

visibility [6]. Mentions serve a different function, as they

allow someone to address a speciﬁc user directly through

the public feed, or to refer to an individual in the third

person [7]. These two means of communication serve distinct

and complementary purposes and together act as the primary

mechanisms for explicit, public, user to user interaction on

Twitter.

The free-form nature of the platform, combined with its

space limitations and resulting annotation vocabulary, have

led to a multitude of uses. Some use the service as a forum

for personal updates and conversation, others as a platform

for receiving and broadcasting real-time news and still others

treat it as an outlet for social commentary and critical culture.

Of particular interest to this study is the role of Twitter as a

platform for political discourse.

B. Data Mining and Twitter

Owing to the fact that Twitter provides a constant stream

of real-time updates from around the globe, much research

has focused on detecting noteworthy, unexpected events as

they rise to prominence in the public feed. Examples of this

work include the detection of inﬂuenza outbreaks [8], seismic

events [9], and the identiﬁcation of breaking news stories [10]–

[12]. These applications are similar in many respects to

streaming data mining efforts focused on other media outlets,

such as Kleinberg and Leskovec’s ‘MemeTracker’ [13].

Its large scale and streaming nature make Twitter an ideal

platform for monitoring events in real time. However, many

of the characteristics that have led to Twitter’s widespread

adoption have also made it a prime target for spammers.

The detection of spam accounts and content is an active area

of research [14]–[16]. In related work we investigated the

purposeful spread of misinformation by politically-motivated

parties [17].

Another pertinent line of research in this area relates to

the application of sentiment analysis techniques to the Twitter

corpus. Work by Bollen et al. has shown that indicators derived

from measures of ‘mood’ states on Twitter are temporally

correlated with events such as presidential elections [18]. In

a highly relevant application, Goorha and Ungar used Twitter

data to develop sentiment analysis tools for the Dow Jones

Company to detect signiﬁcant emerging trends relating to

speciﬁc products and companies [19]. Derivations of these

techniques could be paired with the machinery from Sec-

tion IV to accomplish the kind of real-time public opinion

monitoring described in the introduction.

C. Data Mining and Political Speech

Formal political speech and activity have also been a target

for data mining applications. The seminal work of Poole and

Rosenthal applied multidimensional scaling to congressional

voting records to quantify the ideological leanings of members

of the ﬁrst 99 United States Congresses [20]. Similar work by

Thomas et al. used transcripts of ﬂoor debates in the House

of Representatives to predict whether a speech segment was

provided in support of or opposition to a speciﬁc proposal [21].

Related efforts have been undertaken for more informal,

web-based political speech, such as that found on blogs

and blog comments [22], [23]. While these studies report

reasonable performance, the Twitter stream provides several

advantages compared to blog data: Twitter provides a cen-

tralized data source, updated in real-time, with new sources

automatically integrated into the corpus. Moreover, Twitter

represents a broad range of individual voices, with tens of

thousands of active contributors involved in the political dis-

course.

III. DATA AND METHODS

A. Political Tweets

The Twitter ‘gardenhose’ streaming API (dev.twitter.com/

pages/streaming

api) provides a sample of about 10% of the

entire Twitter feed in a machine-readable JSON format. Each

tweet entry is composed of several ﬁelds, including a unique

identiﬁer, the text of the tweet, the time it was produced, the

username of the account that produced the tweet, and in the

case of retweets or mentions, the account names of the other

users associated with the tweet.

This analysis focuses on six weeks of gardenhose data col-

lected as part of a related study on political polarization [24].

The data cover approximately 355 million tweets produced

during the period between September 14th and November 1st,

2010 — the run-up to the November 4th US congressional

midterm elections.

Among all tweets, we consider as political communication

any tweet that contained at least one politically relevant hash-

tag. Political hashtags were identiﬁed by performing a simple

tag co-occurrence discovery procedure. We began by seeding

our sample with two widely used left- and right-leaning

political hashtags, #p2 (“Progressives 2.0”) and #tcot (“Top

Conservatives on Twitter”). For each of these, we identiﬁed the

set of hashtags with which it co-occurred in at least one tweet,

and ranked the results using the Jaccard coefﬁcient. For the set

of tweets S containing a seed hashtag, and the set of tweets

T containing another hashtag, the Jaccard coefﬁcient between

S and T is given by

σ(S, T) =

|S ∩ T |

|S ∪ T |

. (1)

Thus, when the tweets in which a hashtag and seed both occur

make up a large portion of the tweets in which either occurs,

the two are similar. Using a similarity threshold of 0.005 we

identiﬁed 66 unique hashtags, eleven of which were excluded

due to overly-broad or ambiguous meaning (see Tables I and

II.) The set of all tweets containing any one of these hashtags,

252 thousand in total, is used in all of the following analyses.

It’s important to note that politically-motivated individuals

often annotate content with hashtags whose primary audience

would not likely choose to see such information ahead of

time, a phenomenon known as content injection. As a result,

hashtags in this study are frequently associated with users from

both sides of the political spectrum, and therefore this seeding

algorithm does not create a trivial classiﬁcation scenario [24].

B. Communication Networks

From the set of political tweets we also construct two

networks: one based on mention edges and one based on

retweet edges. In the mention network, nodes representing

users A and B are connected by a weighted, undirected edge if

either user mentioned the other during the analysis period. The

weight of each edge corresponds to the number of mentions

between the two users. The retweet network is constructed in

the same manner: an edge between A and B means that A

retweeted B or viceversa, with the weight representing the

number of retweets between the two. In total, the mention

network consists of 10,142 non-singleton nodes, with 7,175

nodes in its largest connected component (and 119 in the

next-largest). The retweet network is larger, consisting of

23,766 non-singleton nodes, with 18,470 nodes in its largest

connected component (and 102 nodes in the next-largest).

TABLE III

CONTINGENCY TABLE OF INTER-ANNOTATOR AGREEMENT ON MANUAL

CLASSIFICATIONS.

Left Ambiguous Right

Left 303 51 23

Ambiguous 19 32 24

Right 22 59 423

TABLE IV

FINAL CLASS ASSIGNMENTS BASED ON RESOLUTION PROCEDURES

DESCRIBED IN TEXT.

Left Ambiguous Right

373 77 506

C. Labeled Data

Let us now describe the creation of the labeled data used

in this study for for training and testing our classiﬁers. We

randomly selected a set of 1,000 users who were present in

the largest connected components of both the mention and

retweet networks. All users were individually classiﬁed by two

annotators working independently of one another.

Each annotator assigned users to one of three categories:

‘left’, ‘right’, or ‘ambiguous’, based on the content of his or

her tweets produced during the six week study period. The

groups primarily associated with a ‘left’ political alignment

were democrats and progressives; those primarily associated

with a ‘right’ political alignment were republicans, conserva-

tives, libertarians, and the Tea Party. Users coded as ‘ambigu-

ous’ may have been taking part in a political dialogue, but

it was difﬁcult to make a clear determination about political

alignment from the content of their tweets.

Using this coding scheme each of the annotators labeled

1,000 random users. Forty four accounts producing primarily

non-English or spam tweets were considered irrelevant and

excluded from this analysis. Table III shows the classiﬁcations

of each annotator and their agreement.

Inter-annotator agreement is quite high for the ‘left’ and

‘right’ categories, but quite marginal for the ‘ambiguous’

category. This means that there were several users for whom

one annotator had the domain knowledge required to infer a

political alignment while the other did not. To address this

issue we assigned a label to a user when either annotator

detected information suggesting a political alignment in the

content of a user’s tweets. This mechanism was used to resolve

ambiguity in 16% of users. Among the 956 relevant users in

the sample there were 45 for whom the annotators explicitly

disagreed about political alignment (‘left’ vs. ‘right’). These

individuals were included in the ‘ambiguous’ category.

After this resolution procedure, 373 users were labeled by

the human annotators as expressing a ‘left’ political alignment,

506 users were labeled as ‘right’, and 77 were placed in the

‘ambiguous’ category, for a total of 956 users (Table IV).

Ambiguous classiﬁcations are a typical result of scarce data

at the individual level, but for completeness we report worst-

case bounds on accuracy for the scenario in which all of these

users are classiﬁed incorrectly.

IV. CLASSIFICATION

One of the central goals of this paper is to establish effective

features for discriminating politically left- and right-leaning

individuals. To this end we examine several features from

two broad categories: user-level features based on content

and network-level features based on the relationships between

users. Each feature set is represented in terms of a feature-

user matrix M, where M

ij

encodes the value for feature i

with respect to user j.

For content-based classiﬁcations we use linear support

vector machines (SVMs) to discriminate between users in

the ‘left’ and ‘right’ classes. In the simple case of binary

classiﬁcation, an SVM works by embedding data in a high-

dimensional space and attempting to ﬁnd the hyperplane that

best separates the two classes [25]. Support vector machines

are widely used for document classiﬁcation because they

are well-suited to classiﬁcation tasks based on sparse, high-

dimensional data, such as those commonly associated with text

corpora [26].

To quantify performance for different feature sets we report

the confusion matrix for each classiﬁer, as well as accuracy

scores based on 10-fold cross-validation. For a confusion

matrix containing true left (tl), true right (tr), false left (f l)

and false right (fr), the accuracy of a classiﬁer is deﬁned by:

accuracy =

tl + tr

tl + tr + f l + fr

(2)

where tl is the number of left-leaning users who are correctly

classiﬁed, and so on.

A. Content Analysis

1) Full-Text: To establish a performance baseline, we train

a support vector machine on a feature-user matrix correspond-

ing to the TFIDF-weighted terms (unigrams) contained in

each user’s tweets [27]. In addition to common stopwords we

remove hashtags, mentions, and URLs from the set of terms

produced by all users, a step we take to facilitate comparison

with other feature sets. Additionally, we exclude terms that

occur only once in the entire corpus because they carry no

generalizable information and increase memory usage. After

these preprocessing steps, the resulting corpus contains 13,080

features, each representing a single term.

To make it clear how we compute vectors for each user and

his associated tweets let us deﬁne TFIDF in detail. The TFIDF

score for term i with respect to user j is deﬁned in terms of

two components, term frequency (TF) and inverse document

frequency (IDF). TF measures the relative importance of term

i in the set of tweets produced by user j, and is deﬁned as:

T F

ij

=

n

ij

P

k

n

k,j

(3)

where n

ij

is the number of times term i occurs in all tweets

produced by user j, and

P

k

n

k,j

is the total number of terms

in all tweets produced by user j. IDF discounts terms with

high overall prominence across all users, and is deﬁned as:

IDF

i

= log

|U|

|U

i

|

(4)

TABLE V

SUMMARY OF CONFUSION MATRICES AND ACCURACY SCORES FOR

VARIOUS CLASSIFICATION FEATURES, WITH THE SECTIONS IN WHICH

THEY ARE DISCUSSED.

Features Conf. matrix Accuracy Section

Full-Text

h

266 107

75 431

i

79.2% § IV-A1

Hashtags

h

331 42

41 465

i

90.8% § IV-A2

Clusters

h

367 6

38 468

i

94.9% § IV-B

Clusters + Tags

h

366 7

38 468

i

94.9% § IV-B

where U is the set of all users, and U

i

is the subset of users

who produced term i. A term produced by every user has no

discriminative power and its IDF is zero. The product T F

ij

·

IDF

i

measures the extent to which term i occurs frequently

in user j’s tweets without occurring in the tweets of too many

other users.

The classiﬁcation accuracy for this representation of the

data is 79%, and its confusion matrix is shown in Table V.

The lower accuracy bound for this approach, assuming that all

ambiguous users are incorrectly classiﬁed, is 72.6%.

2) Hashtags: Hashtags emerged organically within the

Twitter user community as a way of annotating topics and

threads of discussion. Since these tokens are intended to mark

the content of discussion, we might expect that they contain

substantial information about a user’s political leaning.

In this experiment we populate the feature-user matrix with

values corresponding to the relative frequency with which user

j used a hashtag i. This value is equivalent to the TF measure

from Equation 3, but described in terms of hashtags rather

than unigrams. We note that weighting by IDF did not improve

performance. Eliminating hashtags used by only one user we

are left with 4,701 features. For this classiﬁcation task we

report an accuracy of 90.8%; see Table V for the confusion

matrix. The lower bound on this approach, assuming that all

ambiguous users were misclassiﬁed, is 83.5%.

As evidenced by its higher accuracy score, a classiﬁer that

uses hashtag metadata outperforms one trained on the unigram

baseline data. Analogous ﬁndings are observed in biomedical

document classiﬁcation, where classiﬁers trained on abstracts

outperform those trained on the articles’ full text [28]. The

reasoning underlying this improvement is that abstracts are

necessarily brief and information rich. In the same way,

Twitter users must condense substantial semantic content into

hashtags, reducing noise and simplifying the classiﬁcation

task.

3) Latent Semantic Analysis of Hashtags: Latent semantic

analysis (LSA) is a technique used in text mining to discover a

set of topics present in the documents of a corpus. Based on the

singular value decomposition, LSA is argued to address issues

of polysemy, synonym, and lexical noise common in text

TABLE VI

MOST EXTREME HASHTAG COEFFICIENTS FOR SECOND LEFT SINGULAR

VECTOR. THIS LINEAR COMBINATION OF HASHTAGS APPEARS TO

CAPTURE VARIANCE ASSOCIATED WITH POLITICAL ALIGNMENT.

Hashtag Coeff. Hashtag Coeff.

#tcot 0.380 #p2 -0.914

#sgp 0.030 #dadt -0.071

#ocra 0.020 #p21 -0.042

#hhrs 0.013 #votedem -0.039

#twisters 0.012 #lgbt -0.038

#tlot 0.011 #p2b -0.032

#whyimvotingdemocrat 0.009 #topprog -0.027

#rs 0.005 #onenation -0.025

#ftrs 0.004 #dems -0.023

#ma04 0.004 #gop -0.021

#tpp 0.003 #hcr -0.017

corpora [29]. Given a feature-document matrix, the singular

value decomposition UΣV

t

, produces a factorization in terms

of two sets of orthogonal basis vectors, described by U and V

t

.

The left singular vectors, U, provide a vector basis for terms in

the factorized representation, and the right singular vectors, V ,

provide a basis for the original documents, with the singular

values of matrix Σ acting as scaling factors that identify the

variance associated with each dimension. In practice, LSA is

said to uncover hidden topics present in a corpus, a claim

supported by the analytical work of Papadimitriou et al. [30].

We apply this technique to the hashtag-user matrix in an

attempt to identify latent factors corresponding to political

alignment. The coefﬁcients of the linear combination of hash-

tags most strongly associated with the second left singular

vector, shown in Table VI, suggest that one is present in

the data. Hashtags with extreme coefﬁcients for this dimen-

sion include #dadt for ‘Dont Ask Don’t Tell’, #p2 for

Progressives 2.0, #tcot for Top Conservatives on Twitter,

and #ocra for ‘Organized Conservative Resistance Alliance.’

The hashtag #whyimvotingdemocrat originally became a

trending topic among left-leaning users, but was subsequently

hijacked by right-leaning users to express sarcastic reasons

they might vote for a Democratic candidate. A consequence

of these coefﬁcients is that users who use many left-leaning

hashtags will have negative magnitude with respect to this

dimension, and users who use many right-leaning hashtags

will have positive magnitude in this dimension. Figure 1 shows

clear separation between left- and right-leaning users in terms

of the ﬁrst and second right singular vectors.

A support vector machine trained on features describing

users in terms of the ﬁrst two right singular hashtag vectors

does not improve accuracy compared to hashtag TF scores

alone. Expanding the feature space to the ﬁrst three LSA

dimensions improves performance by an insigniﬁcant amount

(about 0.1%), and the addition of subsequent features only

degrades performance.

B. Network Analysis

The previous two feature sets are based on the content of

each user’s tweets. We might also choose to ignore this content

entirely, focusing instead on the relationships between users.

Fig. 1. Users plotted in the latent semantic space of the ﬁrst and second

right singular vectors. Colors correspond to class labels.

Many social networks exhibit homophilic properties — that

is, users prefer to connect to those more like themselves —

and as a consequence structural information can be leveraged

to infer properties about nodes that tend to associate with one

another [31], [32]. In the following, we focus on the largest

connected component of the retweet network, as previous work

suggests that it may tend to segregate ideologically-opposed

users [24].

The cluster structure of the retweet network was established

by applying a community detection algorithm using the label

propagation method of Raghavan et al. [33]. Starting with an

initial arbitrary label (cluster membership), this greedy method

works by iteratively assigning to each node the label that is

shared by most of its neighbors. Ties are broken randomly

when they occur. Owing to this stochasticity, the label propa-

gation method can return different cluster assignments for the

same graph, even with the same initial conditions. Empirical

analysis highlighted further instability resulting from random

starting conditions: the algorithm easily converges to local

optima.

To address this issue we used initial label assignments based

on the clusters produced by Newman’s leading eigenvector

modularity maximization method for two clusters [34], rather

than assigning labels at random. To verify that consistent

clusters are produced across different runs of the algorithm

for the same starting conditions, we repeated the analysis one

hundred times and compared the label assignments produced

at every run.

The similarity of two label assignments C and C

0

over

a graph with n nodes can be computed by the Adjusted

Rand Index (ARI) [35] as follows. Arbitrarily number the two

clusters of C as c

1

, c

2

, and likewise number the clusters of C

0

Predicting the Political Alignment of Twitter Users

Citations

Social bots distort the 2016 U.S. Presidential election online discussion

Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data

Sensing Trending Topics in Twitter

Homophily and Latent Attribute Inference: Inferring Latent Attributes of Twitter Users from Neighbors

Sentiment Analysis: Detecting Valence, Emotions, and Other Affectual States from Text

References

Birds of a Feather: Homophily in Social Networks

Indexing by Latent Semantic Analysis

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

A vector space model for automatic indexing

What is Twitter, a social network or a news media?

Related Papers (5)

Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

Latent dirichlet allocation

Twitter mood predicts the stock market.

What is Twitter, a social network or a news media?