scispace - formally typeset
Open AccessProceedings ArticleDOI

Quantifying Mental Health Signals in Twitter

TLDR
A novel method for gathering data for a range of mental illnesses quickly and cheaply is presented, then analysis of four in particular: post-traumatic stress disorder, depression, bipolar disorder, and seasonal affective disorder are focused on.
Abstract
The ubiquity of social media provides a rich opportunity to enhance the data available to mental health clinicians and researchers, enabling a better-informed and better-equipped mental health field. We present analysis of mental health phenomena in publicly available Twitter data, demonstrating how rigorous application of simple natural language processing methods can yield insight into specific disorders as well as mental health writ large, along with evidence that as-of-yet undiscovered linguistic signals relevant to mental health exist in social media. We present a novel method for gathering data for a range of mental illnesses quickly and cheaply, then focus on analysis of four in particular: post-traumatic stress disorder (PTSD), depression, bipolar disorder, and seasonal affective disorder (SAD). We intend for these proof-of-concept results to inform the necessary ethical discussion regarding the balance between the utility of such data and the privacy of mental health related information.

read more

Content maybe subject to copyright    Report

Quantifying Mental Health Signals in Twitter
Glen Coppersmith Mark Dredze Craig Harman
Human Language Technology Center of Excellence
Johns Hopkins University
Balitmore, MD, USA
Abstract
The ubiquity of social media provides a
rich opportunity to enhance the data avail-
able to mental health clinicians and re-
searchers, enabling a better-informed and
better-equipped mental health field. We
present analysis of mental health phe-
nomena in publicly available Twitter data,
demonstrating how rigorous application of
simple natural language processing meth-
ods can yield insight into specific disor-
ders as well as mental health writ large,
along with evidence that as-of-yet undis-
covered linguistic signals relevant to men-
tal health exist in social media. We present
a novel method for gathering data for
a range of mental illnesses quickly and
cheaply, then focus on analysis of four in
particular: post-traumatic stress disorder
(PTSD), depression, bipolar disorder, and
seasonal affective disorder (SAD). We in-
tend for these proof-of-concept results to
inform the necessary ethical discussion re-
garding the balance between the utility of
such data and the privacy of mental health
related information.
1 Introduction
While mental health issues pose a significant
health burden on the general public, mental health
research lacks the quantifiable data available to
many physical health disciplines. This is partly
due to the complexity of the underlying causes
of mental illness and partly due to longstanding
societal stigma making the subject all but taboo.
Lack of data has hampered mental health research
in terms of developing reliable diagnoses and ef-
fective treatment for many disorders. Moreover,
population-level analysis via traditional methods
is time consuming, expensive, and often comes
with a significant delay.
In contrast, social media is plentiful and has
enabled diverse research on a wide range of top-
ics, including political science (Boydstun et al.,
2013), social science (Al Zamal et al., 2012), and
health at an individual and population level (Paul
and Dredze, 2011; Dredze, 2012; Aramaki et al.,
2011; Hawn, 2009). Of the numerous health top-
ics for which social media has been considered,
mental health may actually be the most appropri-
ate. A major component of mental health research
requires the study of behavior, which may be man-
ifest in how an individual acts, how they com-
municate, what activities they engage in and how
they interact with the world around them includ-
ing friends and family. Additionally, capturing
population level behavioral trends from Web data
has previously provided revolutionary capabilities
to health researchers (Ayers et al., 2014). Thus,
social media seems like a perfect fit for study-
ing mental health in both individual and overall
trends in the population. Such topics have already
been the focus of several studies (Coppersmith et
al., 2014; De Choudhury et al., 2014; De Choud-
hury et al., 2013d; De Choudhury et al., 2013b;
De Choudhury et al., 2013c; Ayers et al., 2013).
What can we expect to learn about mental health
by studying social media? How does a service like
Twitter inform our knowledge in this area? Nu-
merous studies indicate that language use, social
expression and interaction are telling indicators of
mental health. The well-known Linguistic Inquiry
Word Count (LIWC), a validated tool for the psy-
chometric analysis of language data (Pennebaker
et al., 2007), has been repeatedly used to study
language associated with all types of disorders
(Resnik et al., 2013; Alvarez-Conrad et al., 2001;
Tausczik and Pennebaker, 2010). Furthermore, so-
cial media is by nature social, which means that
social patterns, a critical part of mental health and
illness, may be readily observable in raw Twitter
data. Thus, Twitter and other social media provide

a unique quantifiable perspective on human behav-
ior that may otherwise go unobserved, suggesting
it as a powerful tool for mental health researchers.
The main vehicle for studying mental health in
social media has been the use of surveys, e.g.,
depression battery (De Choudhury, 2013) or per-
sonality test (Schwartz et al., 2013), to deter-
mine characteristics of a user coupled with analyz-
ing their corresponding social media data. Work
in this area has mostly focused on depression
(De Choudhury et al., 2013d; De Choudhury et al.,
2013b; De Choudhury et al., 2013c), and the num-
ber of users is limited by those that can complete
the appropriate survey. For example, De Choud-
hury et al. (2013d) solicited Twitter users to take
the CES-D and to share their public Twitter pro-
file, analyzing linguistic and behavioral patterns.
While this type of study has produced high qual-
ity data, it is limited in size (by survey respon-
dents) and scope (to diagnoses which have a bat-
tery amenable to administration over the internet).
In this paper we examine a range of mental
health disorders using automatically derived sam-
ples from large amounts of Twitter data. Rather
than rely on surveys, we automatically identify
self-expressions of mental illness diagnoses and
leverage these messages to construct a labeled data
set for analysis. Using this dataset, we make the
following contributions:
We demonstrate the effectiveness of our au-
tomatically derived data by showing that sta-
tistical classifiers can differentiate users with
four different mental health disorders: de-
pression, bipolar, post traumatic stress disor-
der and seasonal affective disorder.
We conduct a LIWC analysis of each dis-
order to measure deviations in each illness
group from a control group, replicating pre-
vious findings for depression and providing
new findings for bipolar, PTSD and SAD.
We conduct an open-vocabulary analysis that
captures language use relevant to mental
health beyond what is captured with LIWC.
Our results open the door to a range of large scale
analysis of mental health issues using Twitter.
2 Related Work
For a good retrospective and prospective sum-
mary of the role of social media in mental health
research, we refer the reader to De Choudhury
(2013). De Choudhury identifies ways in which
NLP has and can be used on social media data to
produce what the relevant mental health literature
would predict, both at an individual level and a
population level. She proceeds to identify ways
in which these types of analyses can be used in
the near and far term to influence mental health
research and interventions alike.
Differences in language use have been observed
in the personal writing of students who score
highly on depression scales (Rude et al., 2004),
forum posts for depression (Ramirez-Esparza et
al., 2008), self narratives for PTSD (He et al.,
2012; D’Andrea et al., 2011; Alvarez-Conrad et
al., 2001), and chat rooms for bipolar (Kramer
et al., 2004). Specifically in social media, dif-
ferences have previously been observed between
depressed and control groups (as assessed by
internet-administered batteries) via LIWC: de-
pressed users more frequently use first person pro-
nouns (Chung and Pennebaker, 2007) and more
frequently use negative emotion words and anger
words on Twitter, but show no differences in posi-
tive emotion word usage (Park et al., 2012). Simi-
larly, an increase in negative emotion and first per-
son pronouns, and a decrease in third person pro-
nouns, (via LIWC) is observed, as well as many
manifestations of literature findings in the pattern
of life of depressed users (e.g., social engagement,
demographics) (De Choudhury et al., 2013d). Dif-
ferences in language use in social media via LIWC
have also been observed between PTSD and con-
trol groups (Coppersmith et al., 2014).
For population-level analysis, surveys such as
the Behavioral Risk Factor Surveillance System
(BRFSS) are conducted via telephone (Centers
for Disease Control and Prevention (CDC), 2010).
Some of these surveys cover relatively few par-
ticipants (often in the thousands), have significant
cost, and have long delays between data collec-
tion and dissemination of the findings. However,
De Choudhury et al. (2013c) presents a promising
population-level analysis of depression that high-
lights the role of NLP and social media.
3 Data
All data we obtain is public, posted between
2008 and 2013, and made available from Twitter
via their application programming interface (API).
Specifically, this does not include any data that has

Genuine Statements of Diagnosis
In loving memory my mom, she was only 42, I was 17 & taken away from me. I was diagnosed with having P.T.S.D LINK
So today I started therapy, she diagnosed me with anorexia, depression, anxiety disorder, post traumatic stress disorder and
wants me to
@USER The VA diagnosed me with PTSD, so I can’t go in that direction anymore
I wanted to share some things that have been helping me heal lately. I was diagnosed with severe complex PTSD and... LINK
Disingenuous Statements of Diagnosis
“I think I’m I’m diagnosed with SAD. Sexually active disorder” -anonymous
LOL omg my bro the “psychologist” just diagnosed me with seasonal ADHD AHAHAHAAAAAAAAAAA IM DYING.
The winter blues: Yesterday I was diagnosed with seasonal affective disorder. Now, this sounds a lot more dramat... LINK
Table 1: Examples found via regular expression keyword search for diagnosis tweets.
been marked as ‘private’ by the author or any di-
rect messages.
Diagnosed Group We seek users who publicly
state that they have been diagnosed with various
mental illnesses. Users may make such a state-
ment to seek support from others in their social
network, to fight the taboo of mental illness, or
perhaps as an explanation of some of their behav-
ior. Tweets were obtained using regular expres-
sions on a large multi-year health related collec-
tion, e.g. “I was diagnosed with X. We searched
for four conditions: depression, bipolar disorder,
post traumatic stress disorder (PTSD) and sea-
sonal affective disorder (SAD). The matched diag-
nosis tweets were manually labeled as to whether
the tweet contained a genuine statement of a men-
tal health diagnosis. Table 1 shows examples of
both genuine statements of diagnosis and disin-
genuous statements (often jokes or quotes).
Next, we retrieved the most recent tweets (up
to 3200) for each user with a genuine diagnosis
tweet. We then filtered the users to remove those
with fewer than 25 tweets and those whose tweets
were not at least 75% in English (measured using
the Compact Language Detector
1
). These filter-
ing steps left us with users that were considered
positive examples. Table 2 indicates the number
of users and tweets found for each of the mental
health categories examined. We manually exam-
ined and annotated only half the diagnosis state-
ments for depression indicating there are likely
800-900 depression users available via these auto-
matic methods from our collection, compared to
the 117 obtained via the methods of De Choud-
hury et al. (2013d). Additionally, we emphasize
the low cost and effort of our automated effort
as compared to their crowdsourced survey meth-
1
https://code.google.com/p/cld2/
ods. The difference in collection methods also
suggests that the two have a reasonable chance of
being complementary. This is especially signif-
icant when considering disorders with lower in-
cidence rates than depression (arguably the high-
est), where respondents to crowdsourced surveys
or self-stated diagnoses alike are rare.
This method is similar in spirit to that of De
Choudhury et al. (2013c), where they inferred
a tweet-level classifier for depression from user-
level labels (specifically, tweets from the past three
months from users scoring highly on CES-D for
the positive class and conversely for the negative).
Control Group To build models for analysis
and to validate the data, we also need a sample of
the general population to use as an approximation
of community controls. We follow a similar pro-
cess: randomly select 10k usernames from a list
of Twitter users who posted to a separate random
historical collection within a selected two week
window, downloaded the 3200 most recent tweets
from these users, and apply our two filters: at least
25 tweets and 75% English. This yields a control
group of 5728 random users, whose 13.7 million
tweets were used as negative examples.
Caveats Our method for finding users with
mental health diagnoses has significant caveats: 1)
the method may only capture a subpopulation of
each disorder (i.e., those who are speaking pub-
licly about what is usually a very private mat-
ter), which may not truly represent all aspects of
the population as a whole. 2) This method in
no way verifies whether this diagnosis is genuine
(i.e., people are not always truthful in self-reports).
However, given the stigma often associated with
mental illness, it seems unlikely users would tweet
that they are diagnosed with a condition they do
not have. 3) The control group is likely contami-

Match Users Tweets
Bipolar 6k 394 992k
Depression 5k 441 1.0m
PTSD 477 244 573k
SAD 389 159 421k
Control 10k 5728 13.7m
Table 2: Number of users matching the diagnosis regular
expression, users labeled with genuine diagnoses and tweets
retrieved from diagnosed users for each mental health condi-
tion.
nated by the presence of users that are diagnosed
with the various conditions investigated. We make
no attempt to remove these users, and if we as-
sume that the prevalence of each disorder in the
general population is similar in our control groups,
we likely have hundreds of such diagnosed users
contaminating our control training data. 4) Twitter
users are not an entirely representative sample of
the population as a whole. Despite these caveats,
we find that this method yielded promising results
as discussed in the next sections.
Comorbidity Since some of these disorders
have high comorbidity, there are some users in
more than one class (e.g., those that state a diagno-
sis for PTSD and depression): Bipolar and depres-
sion have 19 users in common (4.8% of the bipo-
lar users, 4.3% of the depression users), PTSD and
depression share 10 (4.0% of PTSD, 2.2% of de-
pression), and bipolar and PTSD share 9 (2.2% of
bipolar, 3.6% of PTSD). Two users state diagnosis
of bipolar, PTSD and depression (less than 1% of
each set). No users stated diagnoses of both SAD
and any other condition investigated.
4 Methods
We quantify various aspects of each user’s lan-
guage usage and pattern of life via automated
methods, extracting features for subsequent ma-
chine learning. We use these to (1) replicate pre-
vious findings, (2) build classifiers to separate di-
agnosed from control users, and (3) introspect on
those classifiers. Introspection here shows us what
quantified signals in the content the classifiers base
their decision on, and thus we can gain intuition
about what signals are present in the content rele-
vant to mental health.
4.1 Linguistic Inquiry Word Count (LIWC)
LIWC provides clinicians with a tool for gather-
ing quantitative data regarding the state of a pa-
tient from the patient’s writing (Pennebaker et al.,
2007). Previous work has found signal in the ‘pos-
itive affect’ and ‘negative affect’ categories of the
LIWC when applied to social media (including
Twitter), so we examine their correlations sepa-
rately, as well as in the context of other LIWC
categories (De Choudhury et al., 2013a). In all,
we examine some of the LIWC categories directly
(Swear, Anger, PosEmo, NegEmo, Anx) and com-
bine pronoun classes by linguistic form: I and We
classes are combined to form Pro1, You becomes
Pro2 and SheHe and They become Pro3. Each of
these classes provides one feature used by subse-
quent machine learning and our other analyses.
4.2 Language Models (LMs)
Language models are commonly used to estimate
how likely a given sequence of words is. Gener-
ally, an n-gram language model refers to a model
that examines strings of up to n words long. This
is less than ideal for applications in social me-
dia: spelling errors, shortenings, space removal,
and other aspects of social media data (especially
Twitter) confounds many traditional word-based
approaches. Thus, we employ two LMs, first a
traditional 1-gram LM (ULM) that examines the
probability of each whole word. Second, a char-
acter 5-gram LM (CLM) to examine sequences of
up to 5 characters.
LMs model the likelihood of sequences from
training data. In our case, we build one of each
model from the positive class (tweets from one
class of diagnosed users e.g., PTSD), yield-
ing ULM
+
and CLM
+
. We also build one of
each model from the negative class (control users),
yielding ULM
and CLM
. We score each tweet
by computing these probabilities and classifying it
according to which model has a higher probability
(e.g., for a given tweet, is ULM
+
> ULM
?).
4.3 Pattern of Life Analytics
For brevity, we only briefly discuss the pattern of
life analytics, since they do not depend on sig-
nificant NLP. They examine how correlates found
to be significant in the mental health literature
may manifest and be measured in social media
data. These are all imperfect proxies for the find-
ings from the literature, but our experiments will
demonstrate that they do collectively provide in-
formation relevant to mental health.
For each of the following analytics we extract
one feature to use in subsequent machine learn-
ing. Social engagement has been correlated with

0.00 0.05 0.10 0.15
Pro1
* *
Pro2
*
Pro3
*
Swear
*
Anger
*
PosEmo
NegEmo
*
Anxiety
****
0 0.005 0.01 0.015
Figure 1: Box and whiskers plot of proportion of tweets each user has (y-axis) matching various LIWC categories. Each
bar represents one LIWC category for one condition PTSD in purple, depression in blue, SAD in orange, bipolar in red and
control in gray. Anxiety occurs an order of magnitude less often than the others, so its proportion is on the right y-axis (and thus
not comparable to the others). Statistically significant deviations from control users are denoted by asterisks.
positive mental health outcomes (Greetham et al.,
2011; Berkman et al., 2000; Organization, 2001;
De Choudhury et al., 2013d), which is difficult
to measure directly so we examine various ways
in which this may be manifest in a user’s tweet
stream: Tweet rate measures how often a twit-
ter user posts (a measure of overall engagement
with this social media platform) and Proportion
of tweets with @mentions measures how often
a user posts ‘in conversation’ (for lack of better
terms) with other users. Number of @mentions is
a measure of how often the user in question en-
gages other users, while Number of self @men-
tions is a measure of how often the user responds
to mentions of themselves (since users rarely in-
clude their own username in a tweet). To estimate
the size of a user’s social network, we calculate
Number of unique users @mentioned and Number
of users @mentioned at least 3 times, respectively.
For each of the following analytics, we calcu-
late the proportion of a user’s tweets that the ana-
lytic finds evidence in: Insomnia and sleep distur-
bance is often a symptom of mental health disor-
ders (Weissman et al., 1996; De Choudhury et al.,
2013d), so we calculate the proportion of tweets
that a user makes between midnight and 4am ac-
cording to their local timezone. Exercise has
also been correlated with positive mental health
outcomes (Penedo and Dahn, 2005; Callaghan,
2004), so we examine tweets mentioning one of a
small set of exercise-related terms. We also use an
English sentiment analysis lexicon from Mitchell
et al. (2013) to score individual tweets according
to the presence and valence of sentiment words.
We apply no thresholds, so any tweet with a senti-
ment score above 0 was considered positive, below
0 was considered negative, and those with score 0
were considered to have no sentiment. Thus we
use the proportion of Insomnia, Exercise, Positive
Sentiment and Negative Sentiment tweets as fea-
tures in subsequent machine learning and analysis.
5 Results
We present three types of experiments to evalu-
ate the quality and character of these data, and to
demonstrate some quantifiable mental health sig-
nals in Twitter. First, we validate our method for
obtaining data by replicating previous findings us-
ing LIWC. Next, we build classifiers to distinguish
each group from the control group, demonstrating
that there is useful signal in the language of each
group, and compare these classifiers. Finally, we
analyze the correlations between our analytics and
classifiers to uncover relationships between them
and derive insight into quantifiable and relevant
mental health signals in Twitter.
Validation First, we provide some validation
for our novel method for gathering samples. We
demonstrate that language use, as measured by
LIWC, is statistically significantly different be-
tween control and diagnosed users. Figure 1
shows the proportion of tweets from each user
that scores positively on various LIWC categories
(i.e., have at least one word from that category).
Box-and-whiskers plots (Tukey, 1977)
2
summa-
rize a distribution of observations and ease com-
2
For a modern implementation see Wickham (2009).

Citations
More filters
Proceedings ArticleDOI

Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media

TL;DR: This paper develops a statistical methodology to infer which individuals could undergo transitions from mental health discourse to suicidal ideation, and utilizes semi-anonymous support communities on Reddit as unobtrusive data sources to infer the likelihood of these shifts.
Journal ArticleDOI

Detecting depression and mental illness on social media: an integrative review

TL;DR: Automated detection methods may help to identify depressed or otherwise at-risk individuals through the large-scale passive monitoring of social media, and in the future may complement existing screening procedures.
Journal ArticleDOI

Facebook language predicts depression in medical records.

TL;DR: It is shown that the content shared by consenting users on Facebook can predict a future occurrence of depression in their medical records, and language predictive of depression includes references to typical symptoms, including sadness, loneliness, hostility, rumination, and increased self-reference.
Journal ArticleDOI

Natural Language Processing of Social Media as Screening for Suicide Risk.

TL;DR: The feasibility of using social media data to detect those at risk for suicide, using natural language processing and machine learning techniques to detect quantifiable signals around suicide attempts, and designs for an automated system for estimating suicide risk are described.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Book

ggplot2: Elegant Graphics for Data Analysis

TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
Journal ArticleDOI

The psychological meaning of words: LIWC and computerized text analysis methods

TL;DR: The Linguistic Inquiry and Word Count (LIWC) system as discussed by the authors is a text analysis system that counts words in psychologically meaningful categories to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles and individual differences.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "Quantifying mental health signals in twitter" ?

The ubiquity of social media provides a rich opportunity to enhance the data available to mental health clinicians and researchers, enabling a better-informed and better-equipped mental health field. The authors present analysis of mental health phenomena in publicly available Twitter data, demonstrating how rigorous application of simple natural language processing methods can yield insight into specific disorders as well as mental health writ large, along with evidence that as-of-yet undiscovered linguistic signals relevant to mental health exist in social media. The authors present a novel method for gathering data for a range of mental illnesses quickly and cheaply, then focus on analysis of four in particular: post-traumatic stress disorder ( PTSD ), depression, bipolar disorder, and seasonal affective disorder ( SAD ). The authors intend for these proof-of-concept results to inform the necessary ethical discussion regarding the balance between the utility of such data and the privacy of mental health related information. 

Crucially, the authors expect that these novel data collection methods can provide complementary information to existing survey-based methods, rather than supplant them. For many disorders rarer than depression ( which has comparatively high incidence rates ), the authors suspect that finding any data will be a challenge, in which case combining these methods with the existing survey collection methods may be the best way to obtain sufficient amounts of data for statistical analyses. Uncovering and interpreting these signals can be best accomplished through collaboration between NLP and mental health researchers. They indicate that individual- and population-level analyses can be made cheaper and more timely than current methods, yet there remains as-of-yet untapped information encoded in language use – promising a rich collaboration between the fields of natural language processing and mental health.