What are the future works mentioned in the paper "Quantifying mental health signals in twitter" ?

Crucially, the authors expect that these novel data collection methods can provide complementary information to existing survey-based methods, rather than supplant them. For many disorders rarer than depression ( which has comparatively high incidence rates ), the authors suspect that finding any data will be a challenge, in which case combining these methods with the existing survey collection methods may be the best way to obtain sufficient amounts of data for statistical analyses. Uncovering and interpreting these signals can be best accomplished through collaboration between NLP and mental health researchers. They indicate that individual- and population-level analyses can be made cheaper and more timely than current methods, yet there remains as-of-yet untapped information encoded in language use – promising a rich collaboration between the fields of natural language processing and mental health.

(Open Access) Quantifying Mental Health Signals in Twitter (2014) | Glen Coppersmith

Quantifying Mental Health Signals in Twitter

Glen Coppersmith Mark Dredze Craig Harman

Human Language Technology Center of Excellence

Johns Hopkins University

Balitmore, MD, USA

Abstract

The ubiquity of social media provides a

rich opportunity to enhance the data avail-

able to mental health clinicians and re-

searchers, enabling a better-informed and

better-equipped mental health ﬁeld. We

present analysis of mental health phe-

nomena in publicly available Twitter data,

demonstrating how rigorous application of

simple natural language processing meth-

ods can yield insight into speciﬁc disor-

ders as well as mental health writ large,

along with evidence that as-of-yet undis-

covered linguistic signals relevant to men-

tal health exist in social media. We present

a novel method for gathering data for

a range of mental illnesses quickly and

cheaply, then focus on analysis of four in

particular: post-traumatic stress disorder

(PTSD), depression, bipolar disorder, and

seasonal affective disorder (SAD). We in-

tend for these proof-of-concept results to

inform the necessary ethical discussion re-

garding the balance between the utility of

such data and the privacy of mental health

related information.

1 Introduction

While mental health issues pose a signiﬁcant

health burden on the general public, mental health

research lacks the quantiﬁable data available to

many physical health disciplines. This is partly

due to the complexity of the underlying causes

of mental illness and partly due to longstanding

societal stigma making the subject all but taboo.

Lack of data has hampered mental health research

in terms of developing reliable diagnoses and ef-

fective treatment for many disorders. Moreover,

population-level analysis via traditional methods

is time consuming, expensive, and often comes

with a signiﬁcant delay.

In contrast, social media is plentiful and has

enabled diverse research on a wide range of top-

ics, including political science (Boydstun et al.,

2013), social science (Al Zamal et al., 2012), and

health at an individual and population level (Paul

and Dredze, 2011; Dredze, 2012; Aramaki et al.,

2011; Hawn, 2009). Of the numerous health top-

ics for which social media has been considered,

mental health may actually be the most appropri-

ate. A major component of mental health research

requires the study of behavior, which may be man-

ifest in how an individual acts, how they com-

municate, what activities they engage in and how

they interact with the world around them includ-

ing friends and family. Additionally, capturing

population level behavioral trends from Web data

has previously provided revolutionary capabilities

to health researchers (Ayers et al., 2014). Thus,

social media seems like a perfect ﬁt for study-

ing mental health in both individual and overall

trends in the population. Such topics have already

been the focus of several studies (Coppersmith et

al., 2014; De Choudhury et al., 2014; De Choud-

hury et al., 2013d; De Choudhury et al., 2013b;

De Choudhury et al., 2013c; Ayers et al., 2013).

What can we expect to learn about mental health

by studying social media? How does a service like

Twitter inform our knowledge in this area? Nu-

merous studies indicate that language use, social

expression and interaction are telling indicators of

mental health. The well-known Linguistic Inquiry

Word Count (LIWC), a validated tool for the psy-

chometric analysis of language data (Pennebaker

et al., 2007), has been repeatedly used to study

language associated with all types of disorders

(Resnik et al., 2013; Alvarez-Conrad et al., 2001;

Tausczik and Pennebaker, 2010). Furthermore, so-

cial media is by nature social, which means that

social patterns, a critical part of mental health and

illness, may be readily observable in raw Twitter

data. Thus, Twitter and other social media provide

a unique quantiﬁable perspective on human behav-

ior that may otherwise go unobserved, suggesting

it as a powerful tool for mental health researchers.

The main vehicle for studying mental health in

social media has been the use of surveys, e.g.,

depression battery (De Choudhury, 2013) or per-

sonality test (Schwartz et al., 2013), to deter-

mine characteristics of a user coupled with analyz-

ing their corresponding social media data. Work

in this area has mostly focused on depression

(De Choudhury et al., 2013d; De Choudhury et al.,

2013b; De Choudhury et al., 2013c), and the num-

ber of users is limited by those that can complete

the appropriate survey. For example, De Choud-

hury et al. (2013d) solicited Twitter users to take

the CES-D and to share their public Twitter pro-

ﬁle, analyzing linguistic and behavioral patterns.

While this type of study has produced high qual-

ity data, it is limited in size (by survey respon-

dents) and scope (to diagnoses which have a bat-

tery amenable to administration over the internet).

In this paper we examine a range of mental

health disorders using automatically derived sam-

ples from large amounts of Twitter data. Rather

than rely on surveys, we automatically identify

self-expressions of mental illness diagnoses and

leverage these messages to construct a labeled data

set for analysis. Using this dataset, we make the

following contributions:

• We demonstrate the effectiveness of our au-

tomatically derived data by showing that sta-

tistical classiﬁers can differentiate users with

four different mental health disorders: de-

pression, bipolar, post traumatic stress disor-

der and seasonal affective disorder.

• We conduct a LIWC analysis of each dis-

order to measure deviations in each illness

group from a control group, replicating pre-

vious ﬁndings for depression and providing

new ﬁndings for bipolar, PTSD and SAD.

• We conduct an open-vocabulary analysis that

captures language use relevant to mental

health beyond what is captured with LIWC.

Our results open the door to a range of large scale

analysis of mental health issues using Twitter.

2 Related Work

For a good retrospective and prospective sum-

mary of the role of social media in mental health

research, we refer the reader to De Choudhury

(2013). De Choudhury identiﬁes ways in which

NLP has and can be used on social media data to

produce what the relevant mental health literature

would predict, both at an individual level and a

population level. She proceeds to identify ways

in which these types of analyses can be used in

the near and far term to inﬂuence mental health

research and interventions alike.

Differences in language use have been observed

in the personal writing of students who score

highly on depression scales (Rude et al., 2004),

forum posts for depression (Ramirez-Esparza et

al., 2008), self narratives for PTSD (He et al.,

2012; D’Andrea et al., 2011; Alvarez-Conrad et

al., 2001), and chat rooms for bipolar (Kramer

et al., 2004). Speciﬁcally in social media, dif-

ferences have previously been observed between

depressed and control groups (as assessed by

internet-administered batteries) via LIWC: de-

pressed users more frequently use ﬁrst person pro-

nouns (Chung and Pennebaker, 2007) and more

frequently use negative emotion words and anger

words on Twitter, but show no differences in posi-

tive emotion word usage (Park et al., 2012). Simi-

larly, an increase in negative emotion and ﬁrst per-

son pronouns, and a decrease in third person pro-

nouns, (via LIWC) is observed, as well as many

manifestations of literature ﬁndings in the pattern

of life of depressed users (e.g., social engagement,

demographics) (De Choudhury et al., 2013d). Dif-

ferences in language use in social media via LIWC

have also been observed between PTSD and con-

trol groups (Coppersmith et al., 2014).

For population-level analysis, surveys such as

the Behavioral Risk Factor Surveillance System

(BRFSS) are conducted via telephone (Centers

for Disease Control and Prevention (CDC), 2010).

Some of these surveys cover relatively few par-

ticipants (often in the thousands), have signiﬁcant

cost, and have long delays between data collec-

tion and dissemination of the ﬁndings. However,

De Choudhury et al. (2013c) presents a promising

population-level analysis of depression that high-

lights the role of NLP and social media.

3 Data

All data we obtain is public, posted between

2008 and 2013, and made available from Twitter

via their application programming interface (API).

Speciﬁcally, this does not include any data that has

Genuine Statements of Diagnosis

In loving memory my mom, she was only 42, I was 17 & taken away from me. I was diagnosed with having P.T.S.D LINK

So today I started therapy, she diagnosed me with anorexia, depression, anxiety disorder, post traumatic stress disorder and

wants me to

@USER The VA diagnosed me with PTSD, so I can’t go in that direction anymore

I wanted to share some things that have been helping me heal lately. I was diagnosed with severe complex PTSD and... LINK

Disingenuous Statements of Diagnosis

“I think I’m I’m diagnosed with SAD. Sexually active disorder” -anonymous

LOL omg my bro the “psychologist” just diagnosed me with seasonal ADHD AHAHAHAAAAAAAAAAA IM DYING.

The winter blues: Yesterday I was diagnosed with seasonal affective disorder. Now, this sounds a lot more dramat... LINK

Table 1: Examples found via regular expression keyword search for diagnosis tweets.

been marked as ‘private’ by the author or any di-

rect messages.

Diagnosed Group We seek users who publicly

state that they have been diagnosed with various

mental illnesses. Users may make such a state-

ment to seek support from others in their social

network, to ﬁght the taboo of mental illness, or

perhaps as an explanation of some of their behav-

ior. Tweets were obtained using regular expres-

sions on a large multi-year health related collec-

tion, e.g. “I was diagnosed with X.” We searched

for four conditions: depression, bipolar disorder,

post traumatic stress disorder (PTSD) and sea-

sonal affective disorder (SAD). The matched diag-

nosis tweets were manually labeled as to whether

the tweet contained a genuine statement of a men-

tal health diagnosis. Table 1 shows examples of

both genuine statements of diagnosis and disin-

genuous statements (often jokes or quotes).

Next, we retrieved the most recent tweets (up

to 3200) for each user with a genuine diagnosis

tweet. We then ﬁltered the users to remove those

with fewer than 25 tweets and those whose tweets

were not at least 75% in English (measured using

the Compact Language Detector

). These ﬁlter-

ing steps left us with users that were considered

positive examples. Table 2 indicates the number

of users and tweets found for each of the mental

health categories examined. We manually exam-

ined and annotated only half the diagnosis state-

ments for depression – indicating there are likely

800-900 depression users available via these auto-

matic methods from our collection, compared to

the 117 obtained via the methods of De Choud-

hury et al. (2013d). Additionally, we emphasize

the low cost and effort of our automated effort

as compared to their crowdsourced survey meth-

https://code.google.com/p/cld2/

ods. The difference in collection methods also

suggests that the two have a reasonable chance of

being complementary. This is especially signif-

icant when considering disorders with lower in-

cidence rates than depression (arguably the high-

est), where respondents to crowdsourced surveys

or self-stated diagnoses alike are rare.

This method is similar in spirit to that of De

Choudhury et al. (2013c), where they inferred

a tweet-level classiﬁer for depression from user-

level labels (speciﬁcally, tweets from the past three

months from users scoring highly on CES-D for

the positive class and conversely for the negative).

Control Group To build models for analysis

and to validate the data, we also need a sample of

the general population to use as an approximation

of community controls. We follow a similar pro-

cess: randomly select 10k usernames from a list

of Twitter users who posted to a separate random

historical collection within a selected two week

window, downloaded the 3200 most recent tweets

from these users, and apply our two ﬁlters: at least

25 tweets and 75% English. This yields a control

group of 5728 random users, whose 13.7 million

tweets were used as negative examples.

Caveats Our method for ﬁnding users with

mental health diagnoses has signiﬁcant caveats: 1)

the method may only capture a subpopulation of

each disorder (i.e., those who are speaking pub-

licly about what is usually a very private mat-

ter), which may not truly represent all aspects of

the population as a whole. 2) This method in

no way veriﬁes whether this diagnosis is genuine

(i.e., people are not always truthful in self-reports).

However, given the stigma often associated with

mental illness, it seems unlikely users would tweet

that they are diagnosed with a condition they do

not have. 3) The control group is likely contami-

Match Users Tweets

Bipolar 6k 394 992k

Depression 5k 441 1.0m

PTSD 477 244 573k

SAD 389 159 421k

Control 10k 5728 13.7m

Table 2: Number of users matching the diagnosis regular

expression, users labeled with genuine diagnoses and tweets

retrieved from diagnosed users for each mental health condi-

tion.

nated by the presence of users that are diagnosed

with the various conditions investigated. We make

no attempt to remove these users, and if we as-

sume that the prevalence of each disorder in the

general population is similar in our control groups,

we likely have hundreds of such diagnosed users

contaminating our control training data. 4) Twitter

users are not an entirely representative sample of

the population as a whole. Despite these caveats,

we ﬁnd that this method yielded promising results

as discussed in the next sections.

Comorbidity Since some of these disorders

have high comorbidity, there are some users in

more than one class (e.g., those that state a diagno-

sis for PTSD and depression): Bipolar and depres-

sion have 19 users in common (4.8% of the bipo-

lar users, 4.3% of the depression users), PTSD and

depression share 10 (4.0% of PTSD, 2.2% of de-

pression), and bipolar and PTSD share 9 (2.2% of

bipolar, 3.6% of PTSD). Two users state diagnosis

of bipolar, PTSD and depression (less than 1% of

each set). No users stated diagnoses of both SAD

and any other condition investigated.

4 Methods

We quantify various aspects of each user’s lan-

guage usage and pattern of life via automated

methods, extracting features for subsequent ma-

chine learning. We use these to (1) replicate pre-

vious ﬁndings, (2) build classiﬁers to separate di-

agnosed from control users, and (3) introspect on

those classiﬁers. Introspection here shows us what

quantiﬁed signals in the content the classiﬁers base

their decision on, and thus we can gain intuition

about what signals are present in the content rele-

vant to mental health.

4.1 Linguistic Inquiry Word Count (LIWC)

LIWC provides clinicians with a tool for gather-

ing quantitative data regarding the state of a pa-

tient from the patient’s writing (Pennebaker et al.,

2007). Previous work has found signal in the ‘pos-

itive affect’ and ‘negative affect’ categories of the

LIWC when applied to social media (including

Twitter), so we examine their correlations sepa-

rately, as well as in the context of other LIWC

categories (De Choudhury et al., 2013a). In all,

we examine some of the LIWC categories directly

(Swear, Anger, PosEmo, NegEmo, Anx) and com-

bine pronoun classes by linguistic form: I and We

classes are combined to form Pro1, You becomes

Pro2 and SheHe and They become Pro3. Each of

these classes provides one feature used by subse-

quent machine learning and our other analyses.

4.2 Language Models (LMs)

Language models are commonly used to estimate

how likely a given sequence of words is. Gener-

ally, an n-gram language model refers to a model

that examines strings of up to n words long. This

is less than ideal for applications in social me-

dia: spelling errors, shortenings, space removal,

and other aspects of social media data (especially

Twitter) confounds many traditional word-based

approaches. Thus, we employ two LMs, ﬁrst a

traditional 1-gram LM (ULM) that examines the

probability of each whole word. Second, a char-

acter 5-gram LM (CLM) to examine sequences of

up to 5 characters.

LMs model the likelihood of sequences from

training data. In our case, we build one of each

model from the positive class (tweets from one

class of diagnosed users – e.g., PTSD), yield-

ing ULM

and CLM

. We also build one of

each model from the negative class (control users),

yielding ULM

−

and CLM

−

. We score each tweet

by computing these probabilities and classifying it

according to which model has a higher probability

(e.g., for a given tweet, is ULM

> ULM

−

?).

4.3 Pattern of Life Analytics

For brevity, we only brieﬂy discuss the pattern of

life analytics, since they do not depend on sig-

niﬁcant NLP. They examine how correlates found

to be signiﬁcant in the mental health literature

may manifest and be measured in social media

data. These are all imperfect proxies for the ﬁnd-

ings from the literature, but our experiments will

demonstrate that they do collectively provide in-

formation relevant to mental health.

For each of the following analytics we extract

one feature to use in subsequent machine learn-

ing. Social engagement has been correlated with

●●

●

●●

●

● ●●

●

●●

●

●●●●●●●●●●●

●

●●●●●●●●●● ●●●

●

●●

●

●●

●

●●

●

●●●

●

●●

●

●●●●●●●

●

●●●

●

●●

●

●●

●

●●●●●

●

●●●●●

●

● ●

●

●●

●

●●●

●

●●●

●

● ●●●●

●

●●

●

● ●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●

●

●●●●●●●

●

●●

●

●●

●

●●●●

●

●●●●●●

●

●●●●●

●

0.00 0.05 0.10 0.15

Pro1

* *

Pro2

Pro3

Swear

Anger

PosEmo

NegEmo

Anxiety

****

0 0.005 0.01 0.015

Figure 1: Box and whiskers plot of proportion of tweets each user has (y-axis) matching various LIWC categories. Each

bar represents one LIWC category for one condition – PTSD in purple, depression in blue, SAD in orange, bipolar in red and

control in gray. Anxiety occurs an order of magnitude less often than the others, so its proportion is on the right y-axis (and thus

not comparable to the others). Statistically signiﬁcant deviations from control users are denoted by asterisks.

positive mental health outcomes (Greetham et al.,

2011; Berkman et al., 2000; Organization, 2001;

De Choudhury et al., 2013d), which is difﬁcult

to measure directly so we examine various ways

in which this may be manifest in a user’s tweet

stream: Tweet rate measures how often a twit-

ter user posts (a measure of overall engagement

with this social media platform) and Proportion

of tweets with @mentions measures how often

a user posts ‘in conversation’ (for lack of better

terms) with other users. Number of @mentions is

a measure of how often the user in question en-

gages other users, while Number of self @men-

tions is a measure of how often the user responds

to mentions of themselves (since users rarely in-

clude their own username in a tweet). To estimate

the size of a user’s social network, we calculate

Number of unique users @mentioned and Number

of users @mentioned at least 3 times, respectively.

For each of the following analytics, we calcu-

late the proportion of a user’s tweets that the ana-

lytic ﬁnds evidence in: Insomnia and sleep distur-

bance is often a symptom of mental health disor-

ders (Weissman et al., 1996; De Choudhury et al.,

2013d), so we calculate the proportion of tweets

that a user makes between midnight and 4am ac-

cording to their local timezone. Exercise has

also been correlated with positive mental health

outcomes (Penedo and Dahn, 2005; Callaghan,

2004), so we examine tweets mentioning one of a

small set of exercise-related terms. We also use an

English sentiment analysis lexicon from Mitchell

et al. (2013) to score individual tweets according

to the presence and valence of sentiment words.

We apply no thresholds, so any tweet with a senti-

ment score above 0 was considered positive, below

0 was considered negative, and those with score 0

were considered to have no sentiment. Thus we

use the proportion of Insomnia, Exercise, Positive

Sentiment and Negative Sentiment tweets as fea-

tures in subsequent machine learning and analysis.

5 Results

We present three types of experiments to evalu-

ate the quality and character of these data, and to

demonstrate some quantiﬁable mental health sig-

nals in Twitter. First, we validate our method for

obtaining data by replicating previous ﬁndings us-

ing LIWC. Next, we build classiﬁers to distinguish

each group from the control group, demonstrating

that there is useful signal in the language of each

group, and compare these classiﬁers. Finally, we

analyze the correlations between our analytics and

classiﬁers to uncover relationships between them

and derive insight into quantiﬁable and relevant

mental health signals in Twitter.

Validation First, we provide some validation

for our novel method for gathering samples. We

demonstrate that language use, as measured by

LIWC, is statistically signiﬁcantly different be-

tween control and diagnosed users. Figure 1

shows the proportion of tweets from each user

that scores positively on various LIWC categories

(i.e., have at least one word from that category).

Box-and-whiskers plots (Tukey, 1977)

summa-

rize a distribution of observations and ease com-

For a modern implementation see Wickham (2009).

Quantifying Mental Health Signals in Twitter

Figures

Citations

Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media

Detecting depression and mental illness on social media: an integrative review

Facebook language predicts depression in medical records.

Shifts to Suicidal Ideation from Mental Health Content in Social Media

Natural Language Processing of Social Media as Screening for Suicide Risk.

References

Scikit-learn: Machine Learning in Python

ggplot2: Elegant Graphics for Data Analysis

Scikit-learn: Machine Learning in Python

ggplot2: Elegant Graphics for Data Analysis

The psychological meaning of words: LIWC and computerized text analysis methods

Related Papers (5)

Predicting Depression via Social Media

Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media

Social media as a measurement tool of depression in populations

The psychological meaning of words: LIWC and computerized text analysis methods

Latent dirichlet allocation

Frequently Asked Questions (2)

Q1. What are the contributions in "Quantifying mental health signals in twitter" ?

Q2. What are the future works mentioned in the paper "Quantifying mental health signals in twitter" ?