Predicting Personality from Twitter

doi:10.1109/PASSAT/SOCIALCOM.2011.33

Jennifer Golbeck

∗

, Cristina Robles

∗

, Michon Edmondson

∗

, and Karen Turner

∗

University of Maryland;{jgolbeck,crobles,michonk8,kturner}@umd.edu

Abstract—Social media is a place where users present them-

selves to the world, revealing personal details and insights into

their lives. We are beginning to understand how some of this

information can be utilized to improve the users’ experiences with

interfaces and with one another. In this paper, we are interested

in the personality of users. Personality has been shown to be

relevant to many types of interactions; it has been shown to be

useful in predicting job satisfaction, professional and romantic

relationship success, and even preference for different interfaces.

Until now, to accurately gauge users’ personalities, they needed to

take a personality test. This made it impractical to use personality

analysis in many social media domains. In this paper, we present

a method by which a user’s personality can be accurately

predicted through the publicly available information on their

Twitter proﬁle. We will describe the type of data collected, our

methods of analysis, and the machine learning techniques that

allow us to successfully predict personality. We then discuss the

implications this has for social media design, interface design,

and broader domains.

Index Terms—personality, social media

I. INTRODUCTION

Social networking on the web has grown dramatically over

the last decade. In January 2005, a survey of social networking

websites estimated that among all sites on the web there

were roughly 115 million members [14]. Just over ﬁve years

later, Twitter alone has exceeded 200 million members. In the

process of creating social networking proﬁles, users reveal a

lot about themselves both in what they share and how they

say it. Through self-description, status updates, photos, and

interests, much of a user’s personality comes out through their

proﬁle.

For decades, psychology researchers have worked to un-

derstand personality in a systematic way. After extensive

work to develop and validate a widely accepted personality

model, researchers have shown connections between general

personality traits and many types of behavior. Relationships

have been discovered between personality and psychological

disorders [42], job performance [4] and satisfaction [24], and

even romantic success [46].

This paper attempts to bridge the gap between social media

and personality research by using the information people

reveal in their online proﬁles. Our core research question asks

whether social media proﬁles can predict personality traits. If

so, then there is an opportunity to integrate the many results

on the implications of personality factors and behavior into the

users’ online experiences and to use social media proﬁles as

a source of information to better understand individuals. For

example, the friend suggestion system could be tailored to a

user based on whether they are more introverted or extraverted.

Previous work has shown that the information in users’

Facebook proﬁles is reﬂective of their actual personalities, not

an “idealized” version of themselves [3]. We expect Twitter

to have similar characteristics, and that plus a broad user base

of 200 million people makes it an ideal platform for study.

We administered the Big Five Personality Inventory to 279

subjects through a Twitter application. In the process, we

gathered their 2000 most recent public Twitter posts (tweets).

This was aggregated, quantiﬁed, and passed through a text

analysis tool to obtain a feature set. Using these statistics, we

were able to develop a model that can predict personality on

each of the ﬁve personality factors to within between 11% and

18% of the actual values.

The ability to predict personality has implications in many

areas. Existing research has shown connections between per-

sonality traits and success in both professional and personal re-

lationships. Social media tools that seek to support these rela-

tionships could beneﬁt from personality insights. Additionally,

previous work on personality and interfaces showed that users

are more receptive to and have greater trust in interfaces and

information that is presented from the perspective of their own

personality features (i.e. introverts prefer messages presented

from an introvert’s perspective). If a user’s personality can be

predicted from their social media proﬁle, online marketing and

applications can use this to personalize their message and its

presentation.

We begin by presenting background on the Big Five Per-

sonality index and related work on personality and social

media. We then present our experimental setup and methods

for analyzing and quantifying Twitter proﬁle information. To

understand the relationship between personality and social

media proﬁles, we present results on correlations between each

proﬁle feature and personality factor. Based on this, we de-

scribe the machine learning techniques used for classiﬁcation

and show how we achieve large and signiﬁcant improvements

over baseline classiﬁcation on each personality factor. We

conclude with a discussion of the implications that this work

has for social media websites and for organizations that may

utilize social media to better understand the people with whom

they interact.

II. B

ACKGROUND AND RELATED WORK

A. The Big Five Personality Inventory

The “Big Five” model of personality dimensions has

emerged as one of the most well-researched and well-regarded

measures of personality structure in recent years. The models

ﬁve domains of personality, Openness, Conscientiousness,

extroversion, Ageeableness, and Neuroticism, were conceived

2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing

DOI

149

Fig. 1: A person has scores for each of the ﬁve personality

factors. Together, the ﬁve factors represent an individual’s

personality.

by Tupes and Christal [47] as the fundamental traits that

emerged from analyses of previous personality tests [29].

McCrae & Costa [28] and John [21] continued ﬁve-factor

model research and consistently found generality across age,

gender, and cultural lines [29]. Additional research has proved

that different tests, languages, and methods of analysis do

not alter the models validity [29], [10], [21], [27]. Such

extensive research has led to many psychologists to accept the

Big Five as the current deﬁnitive model of personality [43],

[34]. It should be noted that the models dependence on trait

terms indicates that the Big Five traits are based on a lexical

approach to personality measurement [43], [9], [10], [16]. The

Big Five traits are characterized by the following:

• Openness to Experience: curious, intelligent, imaginative.

High scorers tend to be artistic and sophisticated in taste

and appreciate diverse views, ideas, and experiences.

• Conscientiousness: responsible, organized, persevering.

Conscientious individuals are extremely reliable and tend

to be high achievers, hard workers, and planners.

• extroversion: outgoing, amicable, assertive. Friendly and

energetic, extroverts draw inspiration from social situa-

tions.

• Agreeableness: cooperative, helpful, nurturing. People

who score high in agreeableness are peace-keepers who

are generally optimistic and trusting of others.

• Neuroticism: anxious, insecure, sensitive. Neurotics are

moody, tense, and easily tipped into experiencing negative

emotions.

B. Applications of the Big Five

Much work has been done with personality as it relates to

our lives and the choices we make. In terms of relationships

with others, many relationships have been identiﬁed. Personal-

ity type is linked to whom users choose to friend on Facebook.

[45] found that extraversion, agreeableness, and openness all

correlated with friendship selection. Personality features have

also been tied to many aspects of romantic relationships,

including partner choice, level of attachment and success

[8], [46]. In terms of interpersonal conﬂict, studies have

associated Big Five traits with coping responses, vengefulness,

and rumination [32],[5]. Social relationships aside, personality

also relates to preferences. Rentfrow and Gosling [39] is one

of many studies that found that personality is a factor that

relates to the music an individual prefers to listen to. Jost et

al. [23] also found that the personality type of an individual

was able to predict whether they would be more likely to

vote for McCain or Obama in 2008. Research has also found

personality differences between self-professed “dog people”

and “cat people” [37], [17]. Within the context of marketing

and advertising, Big Five personality traits have been shown to

accurately predict a consumers preference for national brands

or independent brands [48]. Studies like this show a promising

future for the integration of personality analysis and consumer

proﬁling.

Many studies have demonstrated the usefulness of person-

ality proﬁles within the professional context. Hodgkinson and

Ford [20] found that personality traits affect job performance

and satisfaction, and Barrick and Mount [4] correlated speciﬁc

traits with occupational choices and proﬁciency. Big Five

dimensions have proved valid predictors for team performance

[31], counterproductive behaviors [41], and entrepreneurial

status [49], among many other factors. [6] also revealed rela-

tionships between personality and behavior among managers,

and Barrick and Mount found recurring personality proﬁles

among both high-autonomy and low-autonomy positions in

the workforce [5].

In the space of Human-Computer Interaction, one of the

pioneering studies on the connection between personality and

interface preference was presented in [30]. Users listened to

audio readings of ﬁve book reviews which were written from

the perspective of introverts vs. extroverts. Subjects were able

to identify the personality differences between the reviews and

showed an attraction to those which were closest to their own

personality type. When the personality type matched, subjects

were even more likely to buy the book being reviewed.

This work was extended into ideas of Graphical User

Interface design in [25]. Different GUIs were developed to

represent introverted vs. extroverted personality types. As in

[30], subjects could identify the personality differences and

preferred the interface that matched their own personality type.

C. Personality Research and Social Media

To the best of our knowledge, our work is among the ﬁrst to

look at the relationship between proﬁle information provided

in social networks and personality traits. However, there have

150

Fig. 2: Average scores on each personality trait shown with

standard deviation bars.

been a few previous studies on how personality relates to social

networking more generally.

It has been shown in [40] that extroversion and consci-

entiousness positively correlate with the perceived ease of

use of social media websites. extroversion was also shown

to have a positive correlation with perceived usefulness of

such sites. Not surprisingly, extroversion was also shown to

correlate with the size of a user’s social network in several

studies [2], [44], [45]. There have also been mixed results for

other personality traits. Work in [45] showed that individuals

with high agreeableness scores were selected more often as

friends and that people tended to choose friends with similar

agreeableness, extroversion, and openness scores. This was

not repeated in [44], but a correlation between openness and

number of friends.

III. D

ATA COLLECTION

We created a Twitter application with two functions. First, it

administered a 45-question version of the Big Five Personality

Inventory [22] to users. Subjects would take the test and for

each, we collected the most recent 2,000 tweets from the user

(or all tweets if they had less than 2,000).

We had ﬁfty subjects who were recruited through posts on

Twitter, Facebook, and relevant mailing lists. Twitter does not

collect or release demographic information about its users and,

since we would have no general baseline for comparison, we

did not collect it for our subjects.

Average scores on the personality test are shown in ﬁgure

2 and in table I.

For each user, we began by collecting a simple set of

statistics about their accounts and their tweets. These included

the following:

• Number of followers (people following the user)

• Number of following (people the user follows)

• Density of the social network

• Number of “@mentions” - An @mention is when a user

mentions the name of another user by adding an @ to

the front of the username, as is convention on Twitter

• Number of replies - Using the Twitter API, we could see

how many of the user’s tweets were direct replies to other

user’s tweets.

• Number of hashtags - Hashtags (e.g. #cscw2012) are a

way of tagging a tweet to be part of a given topic or

event. They are also used in “games” where users come

up with tweets to go with a tag (e.g. #ﬁrstdraftmovielines

is used with altered ﬁrst movie lines created by users).

• Number of links

• Words per tweet

For the number of @mentions, replies, hashtags, and links,

we used the raw numbers and the average per tweet.

Our primary analysis was a basic processing of the text of

the tweets. This was done by merging the collected tweets for

a given user into a single “document” and analyzing that.

Previous research has shown that linguistic features can be

used to predict personality traits [26], [36]. . Data collected in

[36] was used in both studies. They had three separate sources

of text, ranging from an average of 1,770 words to over 5,000

words per person.

There is potential to apply these linguistic analysis methods

to help predict personality by analyzing a person’s tweets.

However, the text samples used in earlier studies are much

larger than are available to us through any twitter posting.

Aggregating many tweets from a user gives more information,

but as a series of disconnected statements rather than a

coherent document as was used in other studies. Thus, it is

unclear if Twitter text will be as connected to personality

as was the case in other work. Tweets are much different

sources of text. Each one is limited to 140 characters, and

a compilation of tweets from a given user is more a stream

of disjointed thoughts than a coherent narrative as is found in

the text used in previous personality studies. Thus, it was not

entirely clear whether tweets would be a useful source of data

for this type of analysis.

There were an average of 1914 words per user, and the

distribution is shown in ﬁgure 3. The number of words ranged

from 50 to 5724. These came from an average of 142.2 tweets,

with one using having a maximum of 350 tweets and another

with a minimum of 4.

Following the methods used in [26], [36] as well as other

studies of social media behavior, such as [13], we utilized

two main tools to analyze the content of users’ tweets. The

ﬁrst is that Linguistic Inquiry and Word Count (LIWC) tool

[35]. LIWC produces statistics on 81 different features of

text in ﬁve categories. These include Standard Counts (word

count, words longer than six letters, number of prepositions,

etc.), Psychological Processes (emotional, cognitive, sensory,

and social processes), Relativity (words about time, the past,

the future), Personal Concerns (such as occupation, ﬁnancial

issues, health), and Other dimensions (counts of various types

of punctuation, swear words). We excluded the Standard

Counts and Other Dimension features to eliminate what is

likely to be noise on the type of text we have. The exceptions

are that we included word count, words per sentence, and

swear word counts since these reﬂect verbosity and tone of

151

Fig. 3: Number of words per user.

the user. For the other three categories, the values are given

as the percentage of words in the input that match words in a

given category. For example, it counts the number of “social”

words such as “talk”, “us”, and “friend”, or “anxiety” words

like “nervous”, “afraid”, and “tense”. Correlations between

these features and personality traits (e.g. anxiety words and

neuroticism scores) would not be surprising. This produced

79 text features.

In addition, we ran the text again the MRC Psycholinguistic

Database, a list of over 150,000 words with linguistic and

psycholinguistic features of each word. These include: Kucera-

Francis written frequency, number of categories, and num-

ber of samples; Brown verbal frequency; Familiarity rating;

Meaningfulness via Colorado norms and via Paivio Norms;

Concreteness; age of acquisition; Thorndike-Lorge written

frequency; and the number of letters, phonemes, and syllables.

We computed the average non-zero score for each feature over

all the words from each user.

In addition, we performed a word by word sentiment anal-

ysis of each user’s tweets. Using the General Inquirer dataset

[1], which provides a hand annotated dictionary that assigns

words sentiment values on a -1 to +1 scale, we computed a

score for each user that was the average sentiment score for

all words used in their list of tweets.

IV. P

ERSONALITY AND TWITTER BEHAVIOR

CORRELATIONS

We began by running a Pearson correlation analysis between

subjects’ personality scores and each of the features obtained

from analyzing their tweets and public account data. These are

shown in table II.There are a number of signiﬁcant correlations

here, however none of them are strong enough to directly

predict any personality trait. Correlations that were statistically

signiﬁcant for p<0.05 are bolded.

Many of the correlations make intuitive sense. For example,

conscientiousness is negatively correlated with words about

death (e.g. “bury”, “cofﬁn”, “kill”) and with negative emotions

and sadness, suggesting conscientious people tend to talk less

about unhappy subjects. At the same time, the trait is positively

Fig. 4: Features used for predicting personality.

TABLE I: Average scores on each personality factor on a

normalized 0-1 scale

Agree. Consc. Extra. Neuro. Open.

Average 0.697 0.617 0.586 0.428 0.755

Stdev 0.162 0.176 0.190 0.224 0.147

correlated with the use of “you”, indicating the same people

tend to talk about or to others. Agreeable people also tend to

use “you” a lot, but are less likely to talk about achievements

and money.

However, there are not such intuitive explanations for other

correlations. For example, the number of parentheses used is

negatively correlated with both extraversion and openness. It

is unclear why this is the case, or if these are perhaps falsely

signiﬁcant data points. However, since our focus in this paper

is on predicting personality rather than on focusing on any

particular correlation, we do not assign much weight to any

of these connections. A space of future work would be to

probe more deeply into these correlations over a larger data

set.

V. P

REDICTING PERSONALITY

To predict the score of a given personality feature, we

performed a regression analysis in Weka [18]. We used two

regression algorithms: Gaussian Process and ZeroR, each with

a 10-fold cross-validation with 10 iterations. Two algorithms

had similar performance over the personality features. Results

are shown in table III.

We found that Openness was the easiest to compute and

neuroticism was the most difﬁcult, consistent with the results

152

TABLE II: Pearson correlation values between feature scores and personality scores. Signiﬁcant correlations are shown in bold

for p<0.05. Only features that correlate signiﬁcantly with at least one personality trait are shown.

Language Feature Examples Extro. Agree. Consc. Neuro. Open.

“You” (you, your, thou) 0.068 0.364 0.252 -0.212 -0.020

Articles (a, an, the) -0.039 -0.139 -0.071 -0.154 0.396

Auxiliary Verbs (am, will, have) 0.033 0.042 -0.284 0.017 0.045

Future Tense (will, gonna) 0.227 -0.100 -0.286 0.118 0.142

Negations (no, not, never) -0.020 0.048 -0.374 0.081 0.040

Quantiﬁers (few, many, much) -0.002 -0.057 -0.089 -0.051 0.238

Social Processes (mate, talk, they, child) 0.262 0.156 0.168 -0.141 0.084

Family (daughter, husband, aunt) 0.338 0.020 -0.126 0.096 0.215

Humans (adult, baby, boy) 0.204 -0.011 0.055 -0.113 0.251

Negative Emotions (hurt, ugly, nasty) 0.054 -0.111 -0.268 0.120 0.010

Sadness (crying, grief, sad) 0.154 -0.203 -0.253 0.230 -0.111

Cognitive Mechanisms (cause, know, ought) -0.008 -0.089 -0.244 0.025 0.140

Causation (because, effect, hence) 0.224 -0.258 -0.155 -0.004 0.264

Discrepancy (should, would, could) 0.227 -0.055 -0.292 0.187 0.103

Certainty (always, never) 0.112 -0.117 -0.069 -0.074 0.347

Perceptual Processes

Hearing (listen, hearing) 0.042 -0.041 0.014 0.335 -0.084

Feeling (feels, touch) 0.097 -0.127 -0.236 0.244 0.005

Biological Processes (eat, blood, pain) -0.066 0.206 0.005 0.057 -0.239

Body (cheek, hands, spit) 0.031 0.083 -0.079 0.122 -0.299

Health (clinic, ﬂu, pill) -0.277 0.164 0.059 -0.012 -0.004

Ingestion (dish, eat, pizza) -0.105 0.247 0.013 -0.058 -0.202

Work (job, majors, xerox) 0.231 -0.096 0.330 -0.125 0.426

Achievement (earn, hero, win) -0.005 -0.240 -0.198 -0.070 0.008

Money (audit, cash, owe) -0.063 -0.259 0.099 -0.074 0.222

Religion (altar, church, mosque) -0.152 -0.151 -0.025 0.383 -0.073

Death (bury, cofﬁn, kill) -0.001 0.064 -0.332 -0.054 0.120

Fillers (blah, imean, youknow) 0.099 -0.186 -0.272 0.080 0.120

Punctuation

Commas 0.148 0.080 -0.24 0.155 0.170

Colons -0.216 -0.153 0.322 -0.015 -0.142

Question Marks 0.263 -0.050 0.024 0.153 -0.114

Exclamation Marks -0.021 -0.025 0.260 0.317 -0.295

Parentheses -0.254 -0.048 -0.084 0.133 -0.302

Non-LIWC Features

GI Sentiment 0.177 -0.130 -0.084 -0.197 0.268

Number of Hashtags 0.066 -0.044 -0.030 -0.217 -0.268

Words per tweet 0.285 -0.065 -0.144 0.031 0.200

Links per tweet -0.061 -0.081 0.256 -0.054 0.064

153

Predicting Personality from Twitter

Citations

Cites background or methods from "Predicting Personality from Twitter..."

Cites background or methods from "Predicting Personality from Twitter..."

Cites background from "Predicting Personality from Twitter..."

References

"Predicting Personality from Twitter..." refers background in this paper

"Predicting Personality from Twitter..." refers background in this paper

"Predicting Personality from Twitter..." refers background in this paper

"Predicting Personality from Twitter..." refers background in this paper

"Predicting Personality from Twitter..." refers background in this paper

Related Papers (5)