Journal Article•DOI•

Detection of suicide-related posts in Twitter data streams

M. Johnson Vioulès¹, Bilel Moulahi, Jérôme Azé², Sandra Bringay²•Institutions (2)

AXA¹, Centre national de la recherche scientifique²

01 Jan 2018-Journal of Reproduction and Development (IBM)-Vol. 62, Iss: 1

TL;DR: A new approach that uses the social media platform Twitter to quantify suicide warning signs for individuals and to detect posts containing suicide-related content and the application of the martingale framework highlights changes in online behavior and shows promise for detecting behavioral changes in at-risk individuals.

read less

Abstract: Suicidal ideation detection in online social networks is an emerging research area with major challenges. Recent research has shown that the publicly available information, spread across social media platforms, holds valuable indicators for effectively detecting individuals with suicidal intentions. The key challenge of suicide prevention is understanding and detecting the complex risk factors and warning signs that may precipitate the event. In this paper, we present a new approach that uses the social media platform Twitter to quantify suicide warning signs for individuals and to detect posts containing suicide-related content. The main originality of this approach is the automatic identification of sudden changes in a user's online behavior. To detect such changes, we combine natural language processing techniques to aggregate behavioral and textual features and pass these features through a martingale framework, which is widely used for change detection in data streams. Experiments show that our text-scoring approach effectively captures warning signs in text compared to traditional machine learning classifiers. Additionally, the application of the martingale framework highlights changes in online behavior and shows promise for detecting behavioral changes in at-risk individuals.

...read moreread less

Figures (8)

Table 4 Change Point Detection Results – Max. The bold-font numbers show the best results in terms of true change points detected, number of false alarms and delay by tuning the number of dimensions, the parameter λ, and the testing size.

Table 5 Change Point Detection Results - Frank Dimensions λ Testing size Change Points Detected False Alarms Delay

Figure 2 Stream of martingale values for tweets from user Max - The values are computed using only the feature SPA text score. The point represents the true abrupt emotion change point (#946: “Done with my life”). The martingale values may begin to increase prior to the change point, but with a sudden change in behavior, the values may only increase after the change point.

Figure 4 Martingale values distribution over the series of tweets for user Frank. The values are computed using only the feature SPA text score (one dimension). The point represents the true abrupt emotion change point (#930).

Figure 3 Martingale values distribution over the series of tweets for user Max. The values are computed using four dimensions (SPA text score, friends, volume, and retweets). The point represents the true abrupt emotion change point (#946).

Table 1 Description of the behavioral features. (For the Text Score calculation and Distress Classifier mentioned at the bottom of the table, see the “Feature Extraction for Text Scoring” sections of this paper.)

Table 2 Best Performing Classifiers - Four Distress Classes. The bold-font numbers highlight the best performing models in terms of each evaluation measure using the given behavioral features.

Content maybe subject to copyright Report

HAL Id: lirmm-01633317

https://hal-lirmm.ccsd.cnrs.fr/lirmm-01633317

Submitted on 22 Nov 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Detection of Suicide-Related Posts in Twitter Data

Streams

Mia Johnson Vioulès, Bilel Moulahi, Jérôme Azé, Sandra Bringay

To cite this version:

Mia Johnson Vioulès, Bilel Moulahi, Jérôme Azé, Sandra Bringay. Detection of Suicide-Related Posts

in Twitter Data Streams. Ibm Journal of Research and Development, Ibm Corporation, 2018, 62 (1),

pp.7:1-7:12. �10.1147/JRD.2017.2768678�. �lirmm-01633317�

Detection of Suicide-Related Posts in Twitter Data Streams

M. Johnson Vioulès, B. Moulahi, J. Azé, S. Bringay

Abstract

Suicidal ideation detection in online social networks is an emerging research area with major

challenges. Recent research has shown that the publicly available information spread across

social media platforms holds valuable indicators to effectively detecting individuals with suicidal

intentions. The key challenge of suicide prevention is understanding and detecting the complex

risk factors and warning signs that may precipitate the event. In this paper, we present a new

approach that uses the social media platform Twitter to quantify suicide-warning signs for

individuals and to detect posts containing suicide-related content. The main originality of this

approach is the automatic identification of sudden changes in a user’s online behavior. To detect

such changes, we combine natural language processing techniques to aggregate behavioral and

textual features and pass these features through a martingale framework, which is widely used

for change detection in data streams. Experiments show that our text-scoring approach

effectively captures warning signs in text compared to traditional machine learning classifiers.

Additionally, the application of the martingale framework highlights changes in online behavior

and shows promise for detecting behavioral changes in at-risk individuals.

Introduction

According to the World Health Organization (WHO), it is estimated that 800,000 people die by

suicide each year worldwide with at least as many suicide attempts [1]. The grief felt in the

aftermath of such an event is compounded by the fact that a suicide may be prevented. This

reality of suicide has motivated WHO member states to commit themselves to reducing the rate

of suicide by 10% by 2020 [2].

In an effort to educate the public, the American Foundation for Suicide Prevention (AFSP) [3]

has identified characteristics or conditions that may increase an individual's risk. The three major

risk factors are: 1) health factors (e.g. mental health, chronic pain), 2) environmental factors (e.g.

harassment, stressful life events), and 3) historical factors (e.g. previous suicide attempts, family

history). Additionally, the time period preceding a suicide can hold clues to an individual's

struggle. The AFSP categorizes these warning signs as follows: 1) talk (e.g. mentioning being a

burden or having no reason to live), 2) behavior (e.g. withdrawing from activities, sleeping too

much or too little), and 3) mood (e.g. depression, rage).

Identifying these risk factors is the first step in suicide prevention. However, the social stigma

surrounding mental illnesses means that at-risk individuals may avoid professional assistance [4].

In fact, they may be more willing to turn to less formal resources for support [5]. Recently,

online social media networks have become one such informal resource. Research has shown that

at-risk individuals are turning to contemporary technologies (forums, micro-blogs) to express

their deepest struggles without having to face someone directly [6, 7]. As a result, suicide risk

factors and warning signs have been seen in a new arena. There are even instances of suicide

victims writing their final thoughts on Twitter, Facebook, and other online communities [8, 9].

We believe that this large amount of data on people’s feelings and behaviors can be used

successfully for early detection of behavioral changes in at-risk individuals and may even help

prevent deaths. Social computing research has focused on this topic in recent years [6, 9, 10].

However, few initiatives have been concerned with the real time detection of suicidal ideation on

Twitter. Previously proposed detection methods rely heavily on manually annotated speech,

which can limit their effectiveness due in part to the varying forms of suicide warning signs in

at-risk individuals [6, 11, 12]. Many of these methods also focus on the messages published by

individuals at a specific time independently of the whole context, which may be represented by

the sequence of publications over time.

In this article, we address the challenge of real-time analysis of Twitter posts and the detection of

suicide-related behavior. To process the stream of an individual’s online content we implement a

martingale framework, which is widely used for the detection of changes in data stream settings.

The input into this framework is a series of behavioral features computed from each individual

Twitter post (tweet). These features are compared to previously seen behavior in order to detect a

sudden change in emotion that may indicate an elevated risk of suicide.

The main contributions of this article are twofold. First, using research from the field of

psychology, we design and develop behavioral features to quantify the level of risk for an

individual according to his online behavior on Twitter (speech, diurnal activities, size of social

network, etc.). In particular, we create a feature for text analysis called the Suicide Prevention

Assistant (SPA) text score. Secondly, we monitor the stream of an individual Twitter user and his

behavioral features using an innovative application of a martingale framework to detect sudden

behavioral changes.

Literature review

The definition and identification of risk factors and warning signs lie at the core of suicide

prevention efforts. In this paper, we have chosen to reference the risk factors defined by the

American Pyschiatric Association (APA) [13] and the warning signs identified by the American

Association of Suicidology (AAS) [14]. These resources represent a level of consensus between

mental health professionals and also provide a rich discussion of the differences between suicide

risk factors and warning signs. For further reading, we direct the reader to the work of [14].

As highlighted by [14], warning signs signify increased imminent risk for suicide (i.e., within

minutes, hours, or days). According to the APA suicide warning signs may include talking about

dying, significant recent loss (death, divorce, separation, broken relationship), change in

personality, fear of losing control, suicide plan, suicidal thoughts, or no hope for the future. As

discussed in the following, recent research has shown the emergence of such signs on social

networking sites.

Most of the research at the intersection of behavioral health disorders and social media has

focused on depression detection in online communities, specifically Major Depressive Episodes

(MDE). However, the risk factors for suicide defined by the APA [13] go far beyond depression

alone. It is important to remember that depression does not necessarily imply suicidal ideation.

Rather, suicide should be thought of as a potential end symptom of depression.

While mental health issues such as depression, suicidal ideation, and self-mutilation are defined

medically as separate illnesses with overlapping symptoms, the approaches proposed to detect

them online can be quite similar. Where the approaches vary is in the data they are treating, i.e.

Facebook posts, Twitter tweets, Reddit forum threads, etc. and the specific event they are

attempting to predict. In [7], Moreno et al. first demonstrated that social networking sites could

be a potential avenue for identifying students suffering from depression. The prevalence rates

found for depression disclosed on Facebook corresponded to previous works in which such

information was self-reported. On a larger scale, Jashinsky et al. [15] showed correlation

between Twitter-derived and actual United States per-state suicide data. Together, these works

established the presence of depression disclosure in online communities and opened up a new

avenue for mental health research.

De Choudhury et al. [6] explored the potential to use social media to detect and predict major

depressive episodes in Twitter users. Using crowd-sourcing techniques, the authors built a cohort

of Twitter users scoring high for depression on the CES-D (Center for Epidemiologic Studies

Depression Scale) scale and other users scoring low. Studying these two classes, they found that

what is known from traditional literature on depressive behavior also translates to social media.

For example, users with a high CES-D score posted more frequently late at night, interacted less

with their online friends, and had a higher use of first-person pronouns. Additionally, online

linguistic patterns match previous findings regarding language use of depressed individuals [16].

More recently, De Choudhury et al. [10] have shown that linguistic features are important

predictors in identifying individuals transitioning from mental discourse on social media to

suicidal ideation. The authors showed a number of markers characterizing these shifts including

social engagement, manifestation of hopelessness, anxiety and impulsiveness based on a small

subset of Reddit posts.

Coppersmith et al. [17] examined the data published by Twitter users prior to a suicide attempt

and provided an empirical analysis of the language and emotions expressed around their attempt.

One of the interesting results found in this study is the increase in the percentage of tweets

expressing sadness in the weeks prior to a suicide attempt, which is then followed by a

noticeable increase in anger and sadness emotions the week following a suicide attempt. In the

same line of research, O'Dea et al. [18] confirmed that Twitter is used by individuals to express

suicidality and demonstrated that it is possible to distinguish the level of concern among suicide-

related tweets, using both human coders and an automatic machine classifier. These insights

have also been investigated by Braithwaite et al. [19] who demonstrated that machine learning

algorithms are efficient in differentiating people who are at a suicidal risk from those who are

not. For a more detailed review of the use of social media platforms as a tool for suicide

prevention, the reader may refer to the recent systematic survey by Robinson et al. [20].

These works have shown that individuals disclose their depression and other struggles to online

communities, which indicates that social media networks can be used as a new arena for studying

mental health. Despite the solid foundation, the current literature is missing potential key factors

in the effort to detect depression and predict suicide. Currently, few works analyze the evolution

of an individual's online behavior. Rather, the analysis is static and may take into consideration

one post or tweet at a time while ignoring the whole context. Additionally, an individual's online

speech is often compared to other individuals and not to their own linguistic style. This is a

disadvantage because two individuals suffering the same severity of depression may express

themselves very differently online.

A general framework for detecting suicide-related posts in social networks

In this section, we present the proposed framework for the analysis and real-time detection of

suicide-related posts on Twitter. First, we introduce the real-time detection problem. Then we

define our online proxy measurements (behavior features) for suicide warning signs. Finally, we

describe the approach we implement for detecting behavioral change points.

Problem statement

Sudden behavioral change is one of the most important suicide warning signs. As reported by the

AFSP, a person's suicide risk is greater if a behavior is new or has increased, especially if it is

related to a painful event, loss, or change. Considering this in conjunction with social media,

where users constantly publish messages and deliberately express their feelings, we address

suicide warning sign detection as a real-time data stream mining problem. Given a series of

observations over time (tweets, messages, blog posts), the task is to detect an abrupt change in a

user behavior that may be considered as a suicide warning sign. In the field of data stream

mining, this can be specifically seen as change point detection problem [21, 22]. However,

unlike retrospective detection settings [23, 24], which focus on batch processing, here we are

interested in the setting where the data arrives as a stream in real-time.

To tackle this challenge, we chose an approach employing a martingale framework for change

point detection [25]. This algorithm has been successfully applied to detecting changes in

unlabeled data streams, video-shot change detection [26] and more recently, in the detection of

news events in social networks [27]. To the best of our knowledge, this is the first attempt to

apply the martingale framework on a multi-dimensional data stream generated by Twitter users.

In the following, we start by introducing and describing the proxy measurements for suicide

warning signs that we use to assess the patient's level of suicide risk. As previously mentioned,

these warning signs will be the input into the martingale framework.

Suicide warning signs in online behavior

To identify online behaviors that may reflect the mental state of a Twitter user, we established

two groups of behavioral features: user-centric and post-centric features [11, 28]. User-centric

features characterize the behavior of the user in the Twitter community while post-centric

features are characteristics that are extracted from the properties of a tweet. These features have

been shown to successfully aid in determining the mental health of a user [6]. Table 1 shows a

detailed description of the features we selected.

The American Association of Suicidology (AAS) identifies withdrawing from friends, family, or

society as one of the warning signs of suicide. With the user-centric behavioral features, we aim

to capture changes in a Twitter user's engagement with other users. The friends and followers

features can quantify an individual's interaction with their online community, such as a sudden

decrease in communication. Contrarily, they can also reflect an expansion of an individual's

online community. This is relevant, as at-risk individuals have also been shown to increase their

time online developing personal relationships [29]. It is important to note that we have chosen

the terms friends and followers to represent the unidirectional relationships that are inherent on

Twitter. We acknowledge that this term may not apply for certain user accounts such as

celebrities and news outlets.

HTML Viewer

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Detection of suicide-related posts in twitter data streams" ?

In this paper, the authors present a new approach that uses the social media platform Twitter to quantify suicide-warning signs for individuals and to detect posts containing suicide-related content. Experiments show that their text-scoring approach effectively captures warning signs in text compared to traditional machine learning classifiers.

Q2. What are the future works mentioned in the paper "Detection of suicide-related posts in twitter data streams" ?

For future research, the authors plan to further explore the impact of martingale parameters on the change detection effectiveness. However, overall, the authors believe their initial work presents an innovative approach to detecting suicide-related content in a text stream setting.

Q3. Why did the authors use the Twitter streaming API?

Due to the absence of publicly available datasets for the evaluation of suicide detection in social media, the authors used the Twitter streaming API to collect tweets.

Q4. What are the two groups of behavioral features that the authors used to identify the mental state of a?

To identify online behaviors that may reflect the mental state of a Twitter user, the authors established two groups of behavioral features: user-centric and post-centric features [11, 28].

Q5. What is the first approach to classifying a post?

The first approach is a natural language processing (NLP) method that combines features generated from the text based on an ensemble of lexicons.

Q6. What is the cost of annotating the text?

Although machine learning is commonly used to classify text, the supervised algorithms require annotated datasets, which may be costly in terms of time and potential annotator error.

Q7. What was the main challenge when implementing the martingale framework?

To detect changes in emotional well-being, the authors considered a Twitter user’s activity as a stream of observations and applied a martingale framework to detect change points within that stream.

Q8. How many tweets were used to evaluate distress?

To evaluate lexicon-based NLP approach, the authors used the cross-sectional set of 500 tweets and looked at the average, maximum, and minimum score given by each distress class.

Q9. What is the simplest way to determine whether there is an abrupt change in the user behavior?

Q10. What is the effect of the -values on the incoming data points?

Q11. What are the features that can quantify an individual's interaction with their online community?

The friends and followers features can quantify an individual's interaction with their online community, such as a sudden decrease in communication.

Q12. What is the correlation between the two spikes?

Upon further investigation, the authors found that these two spikes are linked to negative SPA scores (positive emotion) corresponding to birthday wishes that Frank received from other users.

Q13. What does the martingale framework need to be adjusted to interpret?

These peaks highlight that their martingale framework will need to be adjusted to interpret negative SPA text scores as a positive behavioral change that do not require an alarm to sound.

Cites background or methods from "Detection of suicide-related posts ..."

Cites background or methods from "Detection of suicide-related posts ..."

"Detection of suicide-related posts ..." refers background in this paper

"Detection of suicide-related posts ..." refers background in this paper

"Detection of suicide-related posts ..." refers background in this paper

Q1. What contributions have the authors mentioned in the paper "Detection of suicide-related posts in twitter data streams" ?

Q2. What are the future works mentioned in the paper "Detection of suicide-related posts in twitter data streams" ?

Q3. Why did the authors use the Twitter streaming API?

Q4. What are the two groups of behavioral features that the authors used to identify the mental state of a?

Q5. What is the first approach to classifying a post?

Q6. What is the cost of annotating the text?

Q7. What was the main challenge when implementing the martingale framework?

Q8. How many tweets were used to evaluate distress?

Q9. What is the simplest way to determine whether there is an abrupt change in the user behavior?

Q10. What is the effect of the -values on the incoming data points?

Q11. What are the features that can quantify an individual's interaction with their online community?

Q12. What is the correlation between the two spikes?

Q13. What does the martingale framework need to be adjusted to interpret?

Detection of suicide-related posts in Twitter data streams

Figures (8)

Citations

Cites background or methods from "Detection of suicide-related posts ..."

Cites background or methods from "Detection of suicide-related posts ..."

References

"Detection of suicide-related posts ..." refers background in this paper

"Detection of suicide-related posts ..." refers background in this paper

"Detection of suicide-related posts ..." refers background in this paper

Related Papers (5)

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Detection of suicide-related posts in twitter data streams" ?

Q2. What are the future works mentioned in the paper "Detection of suicide-related posts in twitter data streams" ?

Q3. Why did the authors use the Twitter streaming API?

Q4. What are the two groups of behavioral features that the authors used to identify the mental state of a?

Q5. What is the first approach to classifying a post?

Q6. What is the cost of annotating the text?

Q7. What was the main challenge when implementing the martingale framework?

Q8. How many tweets were used to evaluate distress?

Q9. What is the simplest way to determine whether there is an abrupt change in the user behavior?

Q10. What is the effect of the -values on the incoming data points?

Q11. What are the features that can quantify an individual's interaction with their online community?

Q12. What is the correlation between the two spikes?

Q13. What does the martingale framework need to be adjusted to interpret?