scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Breast Cancer Symptom Clusters Derived From Social Media and Research Study Data Using Improved $K$ -Medoid Clustering

TL;DR: The clustering results suggest that some symptom clusters are consistent across social media data and clinical data, such as gastrointestinal related symptoms, menopausal symptoms, mood-change symptoms, cognitive impairment, and pain-related symptoms.
Abstract: Most cancer patients, including patients with breast cancer, experience multiple symptoms simultaneously while receiving active treatment. Some symptoms tend to occur together and may be related, such as hot flashes and night sweats. Co-occurring symptoms may have a multiplicative effect on patients’ functioning, mental health, and quality of life. Symptom clusters in the context of oncology were originally described as groups of three or more related symptoms. Some authors have suggested symptom clusters may have practical applications, such as the formulation of more effective therapeutic interventions that address the combined effects of symptoms rather than treating each symptom separately. Most studies that have sought to identify clusters in breast cancer survivors have relied on traditional research studies. Social media, such as online health-related forums, contain a bevy of user-generated content in the form of threads and posts, and could be used as a data source to identify and characterize symptom clusters among cancer patients. This paper seeks to determine patterns of symptom clusters in breast cancer survivors derived from both social media and research study data using improved $K$ -medoid clustering. A total of 50426 publicly available messages were collected from Medhelp.com and 653 questionnaires were collected as part of a research study. The network of symptoms built from social media was sparse compared with that of the research study data, making the social media data easier to partition. The proposed revised $K$ -medoid clustering helps to improve the clustering performance by reassigning some of the negative-average silhouette width (ASW) symptoms to other clusters after initial $K$ -medoid clustering. This retains an overall nondecreasing ASW and avoids the problem of trapping in local optima. The overall ASW, individual ASW, and improved interpretation of the final clustering solution suggest improvement. The clustering results suggest that some symptom clusters are consistent across social media data and clinical data, such as gastrointestinal related symptoms, menopausal symptoms, mood-change symptoms, cognitive impairment, and pain-related symptoms. We recommend an integrative approach taking advantage of both data sources. Social media data could provide context for the interpretation of clustering results derived from research study data, while research study data could compensate for the risk of lower precision and recall found using social media data.
Citations
More filters
Journal ArticleDOI
TL;DR: It is indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health and future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.

55 citations

Journal ArticleDOI
TL;DR: A new fast exemplar-based clustering approach is proposed for a dataset with an arbitrary shape and number of clusters and theoretically analyze the proposed FEF from the perspective of the generalization performance of clustering and demonstrates the power of the proposed approach on several benchmarking datasets.
Abstract: As a fundamental step in various data analysis, exemplar-based clustering aims at clustering data by identifying representative samples as exemplars of the obtained groups. In this paper, a new fast exemplar-based clustering approach is proposed for a dataset with an arbitrary shape and number of clusters. The proposed approach begins with the reduced set of a dataset, which is a condensation of the dataset obtained by the well-developed kernel density estimators reduced set density estimator or fast reduced set density estimator, and then enters into its two advantageous stages: 1) fast exemplar finding (FEF) and 2) fast cluster assignment. The idea of the proposed approach has its basis in three assumptions: 1) exemplars should come from high-density samples; 2) exemplars should be either the components of the reduced set or their neighbors with high similarities; and 3) clusters can be diffused by surrounding both exemplars and its labeled reduced set. We theoretically analyze the proposed FEF from the perspective of the generalization performance of clustering and demonstrate the power of the proposed approach on several benchmarking datasets.

20 citations


Cites background or methods from "Breast Cancer Symptom Clusters Deri..."

  • ...9, DBSCAN, OPTICS, KKCLUST, and FRSEC outperform AP and K-medoid in most cases, because density-based clustering has prominent advantages over partition-based clustering approaches in discovering clusters of arbitrary shapes....

    [...]

  • ...They are AP [2], K-medoid [1], ordering points to identify the clustering structure (OPTICS) [35], KNN-kernel density-based clustering (KNNCLUST) [34], and DBSCAN [32]....

    [...]

  • ...We use NC to denote the number of clusters obtained self-adaptively by DBSCAN, OPTICS, AP, KNNCLUST, and FRSEC or the preset value before running K-medoid....

    [...]

  • ...9 in which the results about standard deviations only appear in both FRSEC and K-medoid, due to randomness in the execution of them....

    [...]

  • ...Besides, in order to get good clustering results for datasets with spherically shaped clusters, K-medoid stores a relatively small set of estimated cluster centers at each step and then improves the obtained solutions by the strategy which begins with a large number of clusters and then manages to prune them....

    [...]

Journal ArticleDOI
TL;DR: A data-driven analysis framework for bidding behavior is proposed in which a data standardization processing method is proposed that addresses the particularities of the bidding data and provides a fundamental dataset for further market analyses.
Abstract: Myriad studies have been conducted on bidding behaviors following a worldwide restructuring of the electric power market. The common theme in such studies involves idealized and theoretical economic assumptions. However, practical bidding behavior could deviate from that based on theoretical assumptions, which would undoubtedly limit the effectiveness and practicality of the prevalent market-based studies. To analyze the actual bidding behavior in power markets, this paper proposes a data-driven analysis framework for bidding behavior in which a data standardization processing method is proposed that addresses the particularities of the bidding data and provides a fundamental dataset for further market analyses. Then, an adaptive clustering method for bidding behavior is developed that applies the ${K}$ -medoids method and the Wasserstein distance measurement to extract the generators’ bidding patterns from a massive dataset. An empirical analysis of the bidding behavior is conducted on actual data from the Australian energy market. The typical bidding patterns are extracted, and the bidding behaviors are further analyzed.

20 citations


Cites methods from "Breast Cancer Symptom Clusters Deri..."

  • ...To preserve the features of the bids, the K-medoids method [36] is utilized, which can retain the clustering centers in a stepwise format....

    [...]

01 Sep 2019
TL;DR: This article presents a comprehensive review of research applying deep learning in health informatics with a focus on the last five years in the fields of medical imaging, electronic health records, genomics, sensing, and online communication health, as well as challenges and promising directions for future research.
Abstract: Machine learning and deep learning have provided us with an exploration of a whole new research era. As more data and better computational power become available, they have been implemented in various fields. The demand for artificial intelligence in the field of health informatics is also increasing and we can expect to see the potential benefits of artificial intelligence applications in healthcare. Deep learning can help clinicians diagnose disease, identify cancer sites, identify drug effects for each patient, understand the relationship between genotypes and phenotypes, explore new phenotypes, and predict infectious disease outbreaks with high accuracy. In contrast to traditional models, its approach does not require domain-specific data pre-process, and it is expected that it will ultimately change human life a lot in the future. Despite its notable advantages, there are some challenges on data (high dimensionality, heterogeneity, time dependency, sparsity, irregularity, lack of label) and model (reliability, interpretability, feasibility, security, scalability) for practical use. This article presents a comprehensive review of research applying deep learning in health informatics with a focus on the last five years in the fields of medical imaging, electronic health records, genomics, sensing, and online communication health, as well as challenges and promising directions for future research. We highlight ongoing popular approaches' research and identify several challenges in building deep learning models.

19 citations


Cites background from "Breast Cancer Symptom Clusters Deri..."

  • ...Based on online data that patients or their parents wrote about symptoms, there were studies that helped individuals, including pain, fatigue, sleep, weight changes, emotions, feelings, drugs, and nutrition [67–69, 172, 191, 204, 278, 289, 302]....

    [...]

Proceedings ArticleDOI
01 Mar 2017
TL;DR: Survey of latest research study that makes use of online and offline data for cancer classification using data mining technique for cancer detection or classification is made.
Abstract: In the present epoch of technology medical field has become one of the favorite topics of researcher and cancer is among one of them. It is a topic of concern because actual treatment of this disease is not found till date. Patients having this disease can only be saved if and only if it is found in early stage (stage I and stage II). If it is detected in latter stage (stage III and stage IV) then chance of survival will be very less. Machine learning and data mining technique will help medical field to tackle with this problem. Cancer has various symptoms such as tumor, abnormal bleeding, more weight loss etc. It is not necessary that all kinds of tumors are cancerous. Tumors are basically of two types i.e. benign or malignant. To provide appropriate treatment to the patients, symptoms must be studied properly and an automatic prediction system is required which will classify the tumor into benign or malignant. In today's internet world, bulk of data is generated on social media or healthcare websites. From this huge amount of data, symptoms can be fetched by using data mining technique, which will be further useful for cancer detection or classification. This paper makes survey of such latest research study that makes use of online and offline data for cancer classification.

18 citations


Cites methods from "Breast Cancer Symptom Clusters Deri..."

  • ...In paper [1] author chases to resolve impression of syndrome clusters in breast cancer scraps evolved from both social media and research study data using improved K-medoid clustering....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations


"Breast Cancer Symptom Clusters Deri..." refers background in this paper

  • ...More details related to this paper were reported in [35]....

    [...]

Journal ArticleDOI
14 Mar 2014-Science
TL;DR: Large errors in flu prediction were largely avoidable, which offers lessons for the use of big data.
Abstract: In February 2013, Google Flu Trends (GFT) made headlines but not for a reason that Google executives or the creators of the flu tracking system would have hoped. Nature reported that GFT was predicting more than double the proportion of doctor visits for influenza-like illness (ILI) than the Centers for Disease Control and Prevention (CDC), which bases its estimates on surveillance reports from laboratories across the United States ( 1 , 2 ). This happened despite the fact that GFT was built to predict CDC reports. Given that GFT is often held up as an exemplary use of big data ( 3 , 4 ), what lessons can we draw from this error?

2,062 citations


"Breast Cancer Symptom Clusters Deri..." refers background in this paper

  • ...Fan et al. [22] utilized principal component analysis to identify three symptom clusters in clinical surveys of patients with metastatic cancer....

    [...]

01 Jan 1987

1,481 citations


"Breast Cancer Symptom Clusters Deri..." refers background in this paper

  • ...cluster as the centroid of the cluster for the next iteration partitioning while K -means takes the mean of the members within a cluster as the virtual centroid of the cluster for the next iteration partitioning [37]....

    [...]

Journal ArticleDOI
TL;DR: If you want a glimpse of what health care could look like a few years from now, consider "Hello Health," the Brooklyn-based primary care practice that is fast becoming an emblem of modern medicine.
Abstract: If you want a glimpse of what health care could look like a few years from now, consider “Hello Health,” the Brooklyn-based primary care practice that is fast becoming an emblem of modern medicine....

777 citations


"Breast Cancer Symptom Clusters Deri..." refers background in this paper

  • ...Symptoms may be related in various ways, such as sharing a common etiology, interacting, and influencing one another, or sharing a common variance [19]....

    [...]

Journal Article
TL;DR: This study provides beginning insights into the effect of a symptom cluster on patients' functional status and healthcare professionals need to be aware of the presence of symptom clusters and their possible synergistic adverse effect on Patients' future morbidity.
Abstract: PURPOSE/OBJECTIVES To determine the effect of the symptom cluster of pain, fatigue, and sleep insufficiency on functional status during three cycles of chemotherapy. DESIGN Prospective, longitudinal. SETTING 23 outpatient offices and clinics. SAMPLE 93 patients with cancer. The typical participant was female (72%), married/partnered (65%), white (87%), and middle-aged (55.4 years), with an average of 14.8 years of education. METHODS The Quality of Life-Cancer (QOL-CA) version instrument and the Karnofsky Performance Scale (KPS) were completed by 93 outpatients receiving chemotherapy at baseline (Time 1) and at the end of the third cycle (Time 2). Three items (pain, tires easily, sleeps enough to meet needs) from the QOL-CA questionnaire were used to measure the symptom cluster. MAIN RESEARCH VARIABLES Symptom cluster, outcome, functional status, chemotherapy. FINDINGS A hierarchical multiple regression model explained 48.4% of the variance in functional status. The KPS at Time 1 explained 30.8% of the variance in KPS at Time 2 (p < 0.001). After KPS at Time 1 was partialled out from KPS at Time 2, the four independent variables entered in the next step were considered predictors of the change in functional status between Time 1 and Time 2. Age explained 11.8% of the change (p = 0.001), pain explained 10.7% of the change (p = 0.002), and fatigue explained 7.3% of the change (p = 0.011). Sleep insufficiency statistically was not significant, only explaining 1% of the change (p = 0.344). CONCLUSION This study provides beginning insights into the effect of a symptom cluster on patients' functional status. IMPLICATIONS FOR NURSING PRACTICE Healthcare professionals need to be aware of the presence of symptom clusters and their possible synergistic adverse effect on patients' future morbidity.

724 citations


"Breast Cancer Symptom Clusters Deri..." refers background in this paper

  • ...Sarker and Gonzalez [5] introduced a model for ADR detection by selecting natural language processingbased features extracted from social media texts....

    [...]