scispace - formally typeset
Open AccessProceedings ArticleDOI

Understanding participant behavior trajectories in online health support groups using automatic extraction methods

Reads0
Chats0
TLDR
An automatic analysis method is presented that enables efficient examination of participant behavior trajectories in online communities and offers the opportunity to examine behavior over time at a level of granularity that has previously only been possible in small scale case study analyses.
Abstract
This paper presents an automatic analysis method that enables efficient examination of participant behavior trajectories in online communities, which offers the opportunity to examine behavior over time at a level of granularity that has previously only been possible in small scale case study analyses. We provide an empirical validation of its performance. We then illustrate how this method offers insights into behavior patterns that enable avoiding faulty oversimplified assumptions about participation, such as that it follows a consistent trend over time. In particular, we use this method to investigate the connection between user behavior and distressful cancer events and demonstrate how this tool could assist in cancer story summarization.

read more

Content maybe subject to copyright    Report

Understanding Participant Behavior Trajectories in Online
Health Support Groups Using Automatic Extraction
Methods
Miaomiao Wen
Language Technologies Institute
Carnegie Mellon University
5000 Forbes Avenue, Pittsburgh, PA 15213
mwen@cs.cmu.edu
Carolyn Penstein Rosé
Language Technologies Institute
Carnegie Mellon University
5000 Forbes Avenue, Pittsburgh, PA 15213
cprose@cs.cmu.edu
ABSTRACT
This paper presents an automatic analysis method that en -
ables efficient examination of participant behavior trajecto-
ries in online communities. This method offers the opportu -
nity to examine behavior over time at a level of granularity
that has p reviously only been possible in small scale case
study analyses, and thus complements both existing qualita-
tive and quantitative methodologies. We provide an empiri-
cal validation of its performance. We then illustrate how this
metho d offers in sights into behavior patterns that enable
avoiding faulty oversimplified assumptions abou t participa-
tion, such as that it follows a consistent trend over time.
In particular, we use this method to investigate the con-
nection between user behavior and distressful cancer events
and demonstrate how this tool could assist in u nderstanding
participation trajectories in online medical support commu-
nities better so we are better able to design environments
that meet the needs of participants.
Categories and Subject Descriptors
H.5.3 [Group and Organization Interfaces]: Computer
supported cooperative work.
Keywords
Online support groups, Cancer trajectory, Disease event,
Natural language analysis
1. INTRODUCTION
The contribution of this paper is a new automatic analysis
metho d that enab les efficient examination of participant be-
havior trajectories in online communities. We demonstrate
how it offers t he opportunity to ex amine behavior over t ime
at a level of granularity that has previously only been pos-
sible in small scale case study analyses. Using this tool we
are able to offer new insights into the experiences of users in
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GROUP’12, October 27–31, 2012, Sanibel Island, Florida, USA.
Copyright 2012 ACM 978-1-4503-1486-2/12/10 ...$15.00.
one of t he largest online cancer support communities on the
Internet. This insights offered by such a tool complement
both existing qualitative and qu antitative methodologies for
studying community behavior patterns.
Online support groups provide a rich and valuable source
of data related to chronic illness and the inner workings of
social support. A growing number of people who suffer from
chronic or life threatening diseases obtain valuable resources
from online support groups, which are available anytime in
the privacy of one’s home [19]. These affordances of on-
line sup port groups are particularly attractive in the case
of stigmatizing illnesses such as AIDS, alcoholism, breast
and prostate cancer, which are the topics of many pop ular
online medical support communities [5]. In order to design
such environments to maximize b enefit to users, it is nec-
essary to understand how the experiences of users of such
environments unfold over time.
In this paper we seek to overcome some of the method-
ological limitations of current approaches to studying online
support groups. Quantitative approaches to studying be-
havior in online communities abstract away from the details
of individual users in order to reduce behavior to a small
number of variables th at may be related to one another sta-
tistically. Such a reduction is needed in order to understand
the causal mechanisms at work. However, in order to do so
in a valid way, it is important to avoid making assumptions
that do not hold in practice.
Growing out of a tradition of analysis of threaded d iscus-
sion forums that consist of a list of threads, each of which
roughly corresponds to a topic of discussion, quantitative
approaches to modeling participation in online communities
typically model that p articipation in terms of frequency of
types of contributions over time, with the idea of identify-
ing increasing or decreasing trends and the reasons for these
trends using linear modeling techniques. Our work chal-
lenges the underlying assumptions behind such app roaches
by demonstrating more of a period icity in participation, cen-
tered on important cancer events. This is consistent with
other work investigating the importance of key events in a
patient’s cancer history and their eff ect on behavior [20, 21].
One role for qualitative analyses of user behavior t rajec-
tories in mixed meth ods approaches is to offer insights that
challenge overly simplistic assumptions about participation.
However, even such detailed explorations are limited if they
can only be conducted on a very small set of users. For
example, a case stud y analysis of the complete posting his-

tory of one participant in an online cancer support forum
has been published in prior work [26]. That analysis sug-
gests that frequency of participation was clearly correlated
with stress-inducing events. But as a case study, the gener-
alizability of t he results is limited and restricted. Mapping
how the themes of posts change as patients move from di-
agnosis, through treatment, and towards recovery or death
at a grander scale, may provide valuable insights to inform
ways to tailor future psychosocial an d educational interven-
tions in online medical support communities according to
a patient’s cancer event trajectory [25]. However, manu-
ally extracting such a cancer trajectory is effort-consuming
and time consuming. The goal of our work is to automate
such an analysis so that some of the benefits can be ob-
tained without the time and effort. Using this tool, we work
to contribute deeper insights towards understanding a pa-
tient’s psychosocial reactions to important cancer events as
they unfold within that patient’s disease progression [26].
Motivated by earlier qualitative work [26], we use auto-
matically extracted illness trajectories, and present an anal-
ysis of data from an active on line community that provides
support for a statistical connection between the pattern of
participation in online discussion and stress-inducing events.
We demonstrate that when the users are undergoing these
stressful events, they post more than 2 times as often as in
non-event months. The topic of their posts also varies ac-
cording to the events. We also find t hat almost half of the
long-term users began their participation in the online sup -
port community when they were facing some kind of stressful
disease event, such as chemotherapy.
In addition to the meth odological contribution and anal-
ysis, our work makes a technical contribution as well. Many
practical applications in Natural Language Processing ei-
ther require or would greatly benefit from the use of tempo-
ral information. For instance, question-answering and sum-
marization systems demand accurate processing of tempo-
ral information in order to be useful for answering “when”
questions and creating coherent sum maries by temporally
ordering information. Our technical ap proach involves de-
velopment of effective temporal expression extraction in an
informal writing genre that poses significantly different chal-
lenges than generes that have more typically been the focus
of work on temporal expression extraction in the past.
In the remainder of this paper, we first review related
work that p rovides the foundation for our investigation. We
then describe the online community that provides t he con-
text for our work, as well as the data we extracted from it for
our analysis. We then present how we au tomatically gener-
ate and visualize disease trajectories from a user’s complete
posting history. After validation of our visualization tool, we
provide an analysis that illustrates the connection between
posting behavior and the generated disease trajectory. The
paper concludes with discussion and future work.
2. RELATED WORK
For decades, researchers have attempted to examine the
well-being of women with breast cancer as it varies over time.
Most of these studies are qualitative analyses and d o not
examine whether disease phase, such as progression of dis-
ease or disease recurrence, influence well-being [11]. On the
other side of the spectrum, in quantitative analyses, indi-
viduals are frequently grouped together for analysis in ways
that gloss over individidual differences between patients, in-
cluding the specific issues they are dealing with at different
times. Richer insights could be gained through analyses that
consider the impact of important cancer event s within a pa-
tient’s trajectory.
Qualitative research studies exploring individuals’ well-
being across phases of disease report that distress peaks
occur after diagnosis, during chemotherapy, at the conclu-
sion of adjuvant therapy, 6 months - 1 year after mastec-
tomy, when recu rrence is diagnosed, and when the disease
is declared terminal [10, 20, 21]. Researchers have con-
structed these cancer trajectories retrospectively from self-
report questionnaires and interview d ata. Using such a
metho dology, researchers have identified distinct trajecto-
ries of mental and physical functioning over 4 years accord-
ing to 363 patients’ breast cancer experiences [23]. D isease
trajectories for chronic illness have been defined more gen-
erally as the course of the illness over time as identified by
eight disease phases: the period before the illness begins, the
diagnostic period, crisis or life-threatening situation, acute
illness, in which illness or complications require hospitaliza-
tion, a stable phase, where illness is controlled, an u nsta-
ble phase, in which illness is not controlled by a regimen,
a progressive or deterioration phase, and dying [4]. This
trajectory may inform the design of research about the ex-
perience of chronic illnesses such as breast cancer. Moreover,
research u sing questionnaires and interviews do not tell us
what we would see if we explored similar q uestions from the
standpoint of what behavior looks like over time within these
phases. However, these insights that are possible to glean
from interviews or questionnaires are not readily accessible
in the raw data traces of online communities that would
allow us to go beyond self report and observe how partici-
pants respond to their cancer events in real time. The goal
of pushing beyond what is possible with existing well estab-
lished methodologies presents technical challenges, however.
The boundaries between the phases identified here are not
trivial to automatically extract from the posts.
As online support groups become more and more p op-
ular, there are more studies of online support groups [2].
Most qualitative evaluation research to date of online sup-
port groups has analyzed postings from a sample of users
without relating their message content to the medical back-
ground of the patients or their disease trajectory [12, 22]. In
one notable exception, researchers suggest that the pattern
of online discussion group messages was clearly correlated
with the stress-inducing events [26]. But as a case study,
the generalizability of the results is limited and restricted.
In our work, we want to do similar analysis quantitatively.
We draw from prior work that offers the ability to auto-
matically identify themes in discussion behavior in online
groups. In particular, Wang an d colleagues [24] have derived
20 topics from the forum posts using a technique referred to
as Latent Dirichelet Allocation (LDA), which we describe
later. This prior work reveals something of the distribution
of topics discussed by cancer patients, but leaves open inter-
esting q uestions about the topics p atients talk about during
sp ecific periods related to important cancer events.
In order to construct cancer trajectories, we must asso-
ciate events with points in time by extracting mentions of
time points in posts. While much computational work has
been done on temporal expression extraction, our own re-
search differs from this previous work in several respects.
Previous work has mainly fo cused on identifying the sepa-

rate timepoints of each event in news text, where multiple
disparate events may be described [6, 13, 17]. Newswire text
is the primary genre in that work, and that genre is known
to include a lot of explicit temporal expressions, which are
very different from our online forum corpus. Besides specific
use of temporal expressions like “MM/DD/YYYY” or “Oct.
22nd, 2001”, our system resolves generic time expressions,
especially indexical expressions like “tomorrow”, “next Tues-
day”, “two weeks after my diagnosis”, etc., which designate
times that are dependent on the time of the post or some
referential time point. What is more, we also resolve self-
contained time expressions that are special to each user like
“at the age of 57”, “Today is my third breast cancer anniver-
sary”, and “I am six months out of Chemotherapy”. As the
exact date of each post is known, these expressions could be
utilized to infer the illness event times. The language style
in online forums is highly informal. Our automatic analy-
sis approach also considers forum specific jargon, slang and
nicknames [16].
3. CONTEXT OF RESEARCH AND DATA
SET
The data for our investigation was extracted from a large,
online cancer support community operated by a nonprofit
organization dedicated to p roviding the most reliable, com-
plete, and up-to-date information about b reast cancer. This
organization also provides a variety of communication plat-
forms, includ ing discussion boards and chat rooms for pa-
tients, family members and caregivers so that all of these
stakeholder communities are able to exchange support with
each other. In p articular, the discussion board platform is
one of the most popular and active online breast cancer sup-
port groups on the Internet. It contains more than 90,000
registered members and 66 forums organized by disease stage
(e.g., Stage IV and Metastatic Breast Cancer), treatment
(e.g., Chemotherapy - Before, During and After), demo-
graphic group (e.g., Women 40-60ish with Breast Cancer)
or entertainment (e.g., Humor an d Games). In the forums,
members can ask questions, share their stories, and read
posts of others about how to deal with their disease. This
discussion board platform is a rich environment for study-
ing th e dynamics of online support groups. We collected all
of the pu blic posts, users, and their profiles on the discus-
sion board platform from the forum from October 2001 to
January 2011. 31,307 users had at least one post.
4. SYSTEM DESCRIPTION
We aim to automatically generate and v isualize cancer
event trajectories of users. Figure 1 shows the configuration
of our tool “Breast Cancer Trajectory”.
Area 1. User ID
Area 2. Cancer event trajectory
Area 3. Events buttons
Area 4. Monthly post frequency
Use of th e tool begins by inputting a forum user’s ID in
area 1 in Figure 1. The whole trajectory (area 2) begins
with the month of the first post of this user and ends with
the m onth of the last post. By pressing the event buttons
in area 3, the corresponding event tag will appear in area
2, unless the date of this event is not retrievable for this
user. The cancer event tags, “Diag” (Diagnosis), “Chemo”
(Chemotherapy),“Rads” (Radiation t herapy), “Mast” (Mas-
tectomy), “Lump”(L umpectomy),“Recon”(Reconstruction),
“Recur” (Recurrence) and “Mets” (Metastasis) are located
at the month of that event on the trajectory. When a user
presses the “PostNum” button in area 3, then the bars in
area 4 show the monthly posting frequency of the user. The
height of the blue bar corresponds to the number of posts
the user contributed to existing threads each month. The
height of the pink bar corresponds to the number of t hread
starter posts the u ser initiated each month.
4.1 Automatic Cancer Trajectory Generation
Extracting cancer trajectories from a highly informal on-
line forum is a non-trivial problem. Figure 2 shows the flow
chart of event date extraction, which is the most challenging
part of the process. A typical two-year frequent breast can-
cer forum user has 500-1000 posts, which contain 2000-5000
sentences. To reduce the search space, we first extract the
sentences that may contain the temporal information of the
disease events from the posts and then extract a date from
these date sentences instead of directly from the complete
raw posts. In Section 4.1.1, we present how we define an d
extract t hese “date sentences”. The topics and contents of
the messages in this forum are highly diverse. To reduce
noise, we t rain a machine learning model to decide if the
date in the date sentence is the date of the target event.
Finally, we choose the most likely event date based on some
intuitive rules of thumb.
3RVWV
'DWHVHQWHQFHV
6WUHWFK\SDWWHUQIHDWXUH
YHFWRUV
'DWHVIURPWKHGDWH
VHQWHQFHV
(YHQWGDWH
1RLVH"
Figure 2: Event extraction flowchart.
4.1.1 Cancer Event Keywords
The cancer event is usually signaled by a set of keywords.
In our experiment, we manually design an event keyword
set for each cancer event. The keyword set includes the
name of the event, abbreviations, aliases and other related
words. For example, the Chemotherapy keyword set con-
tains the common medical terms, such as AC (Adriamycin
and Cy toxan). The Reconstruct ion keyword set contains
the common surgery type names, such as DIEP (deep in-

2
1
3
4
Figure 1: Automatically-generated breast cancer trajectory of an example user.
ferior epigastric perforator flap breast reconstruction). By
creating an analogous keyword set, our event date ex traction
metho d cou ld b e easily adapted to other datasets.
4.1.2 Date Sentences Extraction
We defi ne a “date sentence” as a sentence that contains
at least one time expression and at least one cancer event
keyword. A user frequently shares the date of her disease
event in close p roximity to mention of that event by posting
these “date sentences”. For example, if we want to detect
when a user was diagnosed of breast cancer, we will extract
sentences like “I was <diagnosis keyword> on <temporal
expression>from her posts. Here the keyword set, <diagnosis
keyword>, is {diagnosed, dx, dx., dx’d}, where “dx , “dx.”
and “dx’d” are the abbreviations of “diagnosed” that are of-
ten used by the breast cancer forum u sers.
In our technical approach, we recognize three types of time
expressions: specific expressions, generic time expressions
and self-contained time expressions (As illustrated below,
the items in the brackets are optional.). Specific expressions
could be resolved easily. Generic time expressions usually
sp ecify the length of interval between the time of the post
and the time of the event. The d ate of t he event could
be calculated by subtracting the duration from the date of
the post. For example, a date sentence is “I had my first
radiation three months ago.” Then her first radiation is three
months before the post time. The most comp licated cases
are self-contained time expressions. Some of them could be
resolved as generic time expressions. For example, “today
is my first breast cancer anniversary” means that she was
diagnosed one year ago. The oth er cases cannot be resolved
without further information about the user. For example,
a lot of users use their age as time references of the events.
For example, “I was diagnosed at t he age of 57”. We obtain
the age of users from their personal profile to handle these
cases. Here are some examples of each type:
Specific expressions
I was diagnosed on Sep(tember|(.)) ((t he) 8(th) (,))
(of) ((20)08|’08).
I was diagnosed in (20)08|’08 Sep (tember) (8(th)).
I was diagnosed on (0)9(/|−)((0)8)(/|−)(20)08.
Generic time expressions
< 2008 10 20 > I was diagnosed in September.
< 2008 09 15 > I was diagnosed a week ago.
< 2008 09 15 > I was diagnosed last week.
< 2011 09 08 > Now I am three years from my
diagnosis.
< 2011 09 08 > I have been a three years survivor
since my first diagnosis.
Self-contained time expressions
I was diagnosed when I was 51 (( years|yr|yrs) old).
I was diagnosed at the age of 51.
4.1.3 Noise Reduction
The temporal exp ression in the date sentences may n ot be
the modifier of the target event, the second step of our cancer
trajectory generation is to build a noise reduction machine
learning mod el. The features are designed to capture both
the characteristic of noisy date sentences and the style of
“true” date sentences (i.e., date sentences that tell the date
of t he target event that has occurred to the user herself).
There are mainly four types of noisy date sentences. In
online support groups, users not only tell stories ab out them-
selves, they also share other patients’ stories (see example
sentence ( 1) below). This kind of sentence usually includes
the acquaintance’s name or personal pronou ns. They also
discuss about or share breast cancer related news or results
of published studies (example (2)). This kind of sentence
usually includes keywords like “study” or “research”. When
talking about their own illness stories, they might be just
concerned but have not actually experienced the event her-
self, like in examples (3) and (4). Neither of these sentences’
authors had metastasis at the time of t he post. When a sen-
tence includes multiple disease keywords like in examp le (5),
we have to decide which event the date expression mo difies.
In example (6), the date expression “Aug 2005” modifies a
Reconstruction event but not a Mastectomy event. So be-
fore we extract the date from the date sentences, we must
first judge if the topic of the sentence is actually the user
herself or not, an d whether she had this event already or is
just concerned.
(1) A friend of mine started chemotherapy this week.
(2) In a recent retrospective stud y by md anderson reported

Dec 2008 at San Antonio, women who had her2+ cancer
with negative nodes and tumors less than 1 cm had a 5 year
recurrence of 23% and distant recurrence (mets) about 15%.
(3) I already freaked ou t and thought this was bc mets back
in march when they told me I had thyroid nodules.
(4) I was diagnosed with stage II bc without metastasis in
Aug..
(5) I had my mastectomy and later had reconstruction in
Aug 2005.
(6) After my mastectomy and removal of 14 nodes on April
11th, my surgeon mentioned that if i got a cut or scrape on
the affected side, i should go to the er.
(7) My mets to my bones and lymph nodes were found in
Feb.
Users often tell the date of their cancer events with some
detailed event-related in formation or description. This in-
formation is more highly individualized than what N-gram
features that are typical of text extraction approaches can
capture. For example, in example (6), besides the disease
event keyword “mastectomy” and the date expression “April
11th”, t he user also t ells how many lymph no des are re-
moved, which is an important feature of the mastectomy
surgery. In example (7), besides the illness event keyword
“mets” (an abbreviation of “metastasis”) and the temporal
expression “Feb”, the user also tells her the metastasis sites.
To capture these detailed but important features, we adopt
a recently introduced method called “stretchy pattern” fea-
tures instead of the commonly-used N-gram features. These
stretchy pattern features can be extracted from text using a
tool called LightSIDE [14];
The intuition behind this decision is that to better cap-
ture t he wide variety of flexible and informal language found
in social media, we need linguistic features that have strong
expressive power and can be mod eled with reasonably small
amounts of training data. To this end, prior work h as pro-
posed the notion of a “stretchy pattern” to model stylis-
tic variation in sociolects [9]. A stretchy pattern is de-
fined as a seq uence of word categories, some of which may
be Gaps that are able to cover some number of symbols
of any type. It is the Gap categories that make the pat-
terns flexible. We designate every word instance by its
word category label. A Gap is a special category. Com-
pared to N-gram patterns, stretchy patterns allow longer
linguistic patterns to be captured, and t o do so in a flex-
ible way. Using the appropriate word categories, stretchy
patterns are applied here to classify if the temporal ex-
pression in the sentence is describing the the u ser’s tar-
get event. For sentences like example (6), numbers are
replaced by a word category <Number>. For sentences
like example (7), a word category <BodyPart>, which in-
cludes “liver”, “lung” and “brain”, etc. can appear between
<event keyword> and <temporal expression>. But if k <
event
1
keyword >6=< event
2
keyword >, then k should not
appear between <event2 keyword> and <temporal expression>.
If so, the temporal expression is more likely to be the date
of event
2
but not event
1
, like in example (5).
4.1.4 Rules of Thumb for Resolving Temporal Ambi-
guities
If there is more than on e temporal expression in a sen-
tence, then we intuitively choose the time expression that
is the nearest to the event keyword. When more than one
date is extracted for an event of a user, for example, date
i
is extracted from N
i
sentences, then we choose the date(i)
with the biggest N(i). The assumption is that the more fre-
quently the user associates the event with a date, the more
probably the date is the event date.
4.2 LDA Topic Modeling
To observe how topics of a user’s posts can vary with her
progression of disease events, a statistical topic modeling
approach is used to identify top ical themes in each mes-
sage. In prior work [24], cancer-related dictionaries have
been constructed using Latent Dirichlet Allocation ( LDA).
LDA is a statistical generative model that can be used to
discover latent topics in documents as well as the words as-
sociated with each topic [3]. Wang an d colleagues [24] first
trained an LDA mod el using 30,000 breast cancer messages
randomly selected from the entire dataset. Then 20 latent
topics were derived from this do cument collection. For each
topic, a topic dictionary consisted of 500 words that were
determined to to strongly correlate with that topic. Table
1 shows sample vocabulary for each LDA topic dictionary.
A comp lete list is provided in the online appendix
1
. With
these 20 cancer-related topic dictionaries, the topic of each
post is represented as a 20-dimension topic distribution vec-
tor. Each dimension of the vector calculates the frequency
of words in a message matching its corresponding dictionary.
For example, in the following post,
Girls please pray for me. I am so sick.
Susan
There are 10 words in this post. 3 words, “girls”, “please”
and “am”, belong to the “Forum Communication” topic vo-
cabulary, so the 4th dimension is 0.3. 4 words, “girls”, “I”,
“am” and “sick”, belong to the “Emotional reaction” topic
vocabulary, so t he 15th dimension is 0.3. Similarly, 3 words,
“girls”, “please” and “pray”, belong to t he “Spiritual” topic
vocabulary, so the 17th dimension is 0.3.
5. TOOL VALIDATION
Before describing how to use our tool to uncover new
knowledge abou t posting behavior and cancer histories, we
first validate th e accuracy of our cancer trajectory extraction
system in this section.
5.1 Noise Reduction
We randomly choose 100 users and manually labeled all
the date sentences in their posts as the training data. We use
Bayesian logistic regression as our machine learning model.
The stretchy pattern features are extracted using LightSIDE
[14]. In our ex periment, we used 16 manually-collected word
categories when ext racting stretchy pattern features. For
example, <I> is the first person category. <prep> is the
preposition category. <doctor> category contains the words
that are used to refer to doctors. We used Weka [27], a
machine learning toolkit, to build the regression mod els. We
also experimented with an SVM classifier and found logistic
regression to do slightly better. The 10-fold cross validation
results are shown in Table 2. The results indicate that by
using the stretchy pattern features, we could reliably remove
noisy date sentences.
The top 10 stretchy patt ern features for Metastasis events
1
http://www.cs.cmu.edu/ yichiaw/Data/CSCW2012/CSCW2012-
FeatureSet.htm

Citations
More filters
Proceedings ArticleDOI

Forum77: An Analysis of an Online Health Forum Dedicated to Addiction Recovery

TL;DR: A taxonomy describing phases of addiction expressed by Forum77 members is developed, and it is found that while almost 50% relapse, the prognosis for ending in RECOVERING is favorable.
Proceedings ArticleDOI

Seekers, Providers, Welcomers, and Storytellers: Modeling Social Roles in Online Health Communities

TL;DR: It is found that members frequently change roles over their history, from ones that seek resources to ones offering help, while the distribution of roles is stable over the community's history.
Journal ArticleDOI

Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

TL;DR: This approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text and demonstrates the feasibility of the low-cost approach to automatically detect those failures.
Journal ArticleDOI

Leveraging cues from person-generated health data for peer matching in online communities.

TL;DR: A peer-matching system that automatically profiles and recommends peer mentors to mentees based on person-generated health data (PGHD) points to interpersonal communication cues embedded in PGHD that could prove critical for building mentoring relationships among the growing membership of online health communities.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book

Data Mining: Practical Machine Learning Tools and Techniques

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Book

Data Mining

Ian Witten
TL;DR: In this paper, generalized estimating equations (GEE) with computing using PROC GENMOD in SAS and multilevel analysis of clustered binary data using generalized linear mixed-effects models with PROC LOGISTIC are discussed.
Book

The Wounded Storyteller: Body, Illness, and Ethics

TL;DR: In this paper, the body's problem with illness is described as a Call for Stories, and a call for stories as a call-for-the-call for stories is presented.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What contributions have the authors mentioned in the paper "Understanding participant behavior trajectories in online health support groups using automatic extraction methods" ?

This paper presents an automatic analysis method that enables efficient examination of participant behavior trajectories in online communities. This method offers the opportunity to examine behavior over time at a level of granularity that has previously only been possible in small scale case study analyses, and thus complements both existing qualitative and quantitative methodologies. The authors provide an empirical validation of its performance. The authors then illustrate how this method offers insights into behavior patterns that enable avoiding faulty oversimplified assumptions about participation, such as that it follows a consistent trend over time. In particular, the authors use this method to investigate the connection between user behavior and distressful cancer events and demonstrate how this tool could assist in understanding participation trajectories in online medical support communities better so they are better able to design environments that meet the needs of participants. 

In contrst, the automatically extracted cancer trajectories will allow us to study how users adjust to this illness at a large scale. As the cancer events are tightly related to information and emotional support seeking, their work is potentially useful for online support group studies such as those published in related work [ 24 ]. There are several potential directions for improving the current interface. Second, the authors can represent the message topic variation across the cancer trajectory. 

Women receiving chemotherapy have reported increased levels of psychological distress, difficulties with psychosocial function [21] and increased level of uncertainty [11] when compared with women not receiving chemotherapy. 

The contribution of this paper is a new automatic analysis method that enables efficient examination of participant behavior trajectories in online communities. 

The authors also find that almost half of the long-term users began their participation in the online support community when they were facing some kind of stressful disease event, such as chemotherapy. 

By pressing the event buttons in area 3, the corresponding event tag will appear in area 2, unless the date of this event is not retrievable for this user. 

One role for qualitative analyses of user behavior trajectories in mixed methods approaches is to offer insights that challenge overly simplistic assumptions about participation. 

(6) After my mastectomy and removal of 14 nodes on April 11th, my surgeon mentioned that if i got a cut or scrape on the affected side, i should go to the er. (7) My mets to my bones and lymph nodes were found in Feb.