scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Knowledge discovery from user health posts

TL;DR: This work collects real time health posts from reputed websites, performs data mining to determine the various possible associations from these posts, and performs knowledge discovery from user posts, whereby useful `patterns' about groups like: disease to disease, disease to drug and drug to symptom are discovered.
Abstract: Online health communities offer tremendous medical information that could be available to all. However as is the case with other data intensive applications, this information contains hidden patterns which if explored, analyzed and understood could be very useful for administrators, medical practitioner sand patients alike. There are websites where patients express their experiences or side-effects on drugs used. In this work we collect such real time health posts from reputed websites, and perform data mining to determine the various possible associations from these posts. Also, we shall perform knowledge discovery from user posts, whereby useful ‘patterns’ about groups like: disease to disease, disease to drug and drug to symptom are discovered.
Citations
More filters
Book ChapterDOI
01 Jan 2021
TL;DR: A detailed review regarding the role and efficiency of popular machine learning algorithms such as Bayes, SVM, ANN, kNN, random forests in determining psychological tension is presented in this article.
Abstract: Psychological tension is a growing concern worldwide and has gradually victimized individuals across differing age groups, gender and nationality globally. Despite the omnipresence of technology, especially in the field of healthcare, psychological tension continues to be a powerful and widespread disorder, implications of which manifest in the form of varied physical and mental ailments. This paper presents a detailed review regarding the role and efficiency of popularly used machine learning algorithms such as Bayes, SVM, ANN, kNN, random forests in determining psychological tension. Systematic analysis of the physiological features, their thresholds and the scenario in question leads to successful classification of tension as low, medium or high. Knowledge-based systems that could effectively diagnose psychological tension with scientific quantification techniques shall be immensely useful in studying human affect and also successfully mitigate tension/strain by promoting its early automated/semi-automated detection, thus largely contributing to mankind.

2 citations

Book ChapterDOI
01 Jan 2021
TL;DR: This work retrieved real-time twitter data pertaining to three currently popular hashtags in the Indian context and carried out extensive experimentation analysis about the prevailing sentiment of a strata of population.
Abstract: Twitter analytics is a classic research area especially with the widespread presence of Big Data in various online media such as—social network sites, online portals for shopping, e-commerce, forums, chats, recommendation systems, and online services. Ascertaining the sentiment behind, the various types of tweets by different persons can provide great insights on various aspects including behavioral patterns. Besides highlighting the newest trends in the field, we retrieved real-time twitter data pertaining to three currently popular hashtags in the Indian context and carried out extensive experimentation analysis about the prevailing sentiment of a strata of population. Inclusion of current challenges, future trends and applications of sentiment analysis from Twitter data makes this novel work very useful for fellow researchers.

2 citations

Book ChapterDOI
01 Jan 2021
TL;DR: In this paper, the performance of four popular machine learning classification algorithms (Naive Bayes, decision trees, logistic regression, and random forest) on two popular benchmarked datasets (wine quality dataset and glass identification dataset) is compared.
Abstract: Supervised algorithms depend on the given data for categorizing. In present work, we used both parametric and nonparametric types of classifiers. We intend to compare the performance of four popular machine learning classification algorithms—Naive Bayes, decision trees, logistic regression, and random forest on two popular benchmarked datasets—wine quality dataset and glass identification dataset. To get a wide angle of the performance of these algorithms, we incorporated both binary and multi-class classification which also solved the problem of imbalance in the dataset. In current work, we compare and demonstrate various supervised machine learning classification algorithms on the two well-known datasets. The performance of the algorithms was measured using accuracy, recall, precision, and F1-score. It was observed that nonparametric algorithms like random forest classifier and decision tree classifier bested the parametric algorithms like logistic regression and naive Bayes. Moreover, as the datasets were imbalanced, we figured out which algorithm performs better under what circumstances. In particular, random forest achieved best performance in terms of all considered metrics, with accuracy of 82 and 83% in wine datasets and 79% in glass identification dataset.

1 citations

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , the design, development, functionalities, and upcoming trends in investigation of Big Data Analytics are discussed along with advantages in relation to infrastructural, organizational, operational, managerial, strategic areas, and articulation of latest trending areas.
Abstract: Alarming surge in amounts of diverse data in various domains has contributed to ever-growing research in Big Data Analytics globally. Despite the enormous boom in effective application of Big Data Analytics, health care has not entirely clutched the possible benefits. This paper studies the design, development, functionalities, and upcoming trends in investigation of Big Data Analytics. In this paper, the five Big Data Analytics’ potentials are showcased along with advantages in relation to infrastructural, organizational, operational, managerial, strategic areas, and articulation of latest trending areas. Current paper will be greatly advantageous to fellow researchers not just with fundamental facets pertaining to Big Data Analytics in healthcare domain but also a summary of research gaps, latest trends, and developments, thereby opening new avenues for future research.
References
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

01 Jan 2006
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

2,591 citations

Journal ArticleDOI
TL;DR: It is seen that factors such as chest pain being asymptomatic and the presence of exercise-induced angina indicate the likely existence of heart disease for both men and women, and resting ECG status is a key distinct factor for heart disease prediction.
Abstract: This paper investigates the sick and healthy factors which contribute to heart disease for males and females. Association rule mining, a computational intelligence approach, is used to identify these factors and the UCI Cleveland dataset, a biological database, is considered along with the three rule generation algorithms - Apriori, Predictive Apriori and Tertius. Analyzing the information available on sick and healthy individuals and taking confidence as an indicator, females are seen to have less chance of coronary heart disease then males. Also, the attributes indicating healthy and sick conditions were identified. It is seen that factors such as chest pain being asymptomatic and the presence of exercise-induced angina indicate the likely existence of heart disease for both men and women. However, resting ECG being either normal or hyper and slope being flat are potential high risk factors for women only. For men, on the other hand, only a single rule expressing resting ECG being hyper was shown to be a significant factor. This means, for women, resting ECG status is a key distinct factor for heart disease prediction. Comparing the healthy status of men and women, slope being up, number of coloured vessels being zero, and oldpeak being less than or equal to 0.56 indicate a healthy status for both genders.

329 citations

BookDOI
01 Jan 2001
TL;DR: In this paper, the authors present a set of rules over attribute taxonomies, which are then mined using a combination of data partitioning and pruning, with the objective of finding the optimal set of attributes for each attribute.
Abstract: 1. Introduction.- 2. Search Space Partition-Based Rule Mining.- 2.1 Problem Statement.- 2.1.1 Canonical Attribute Sequences (cas).- 2.1.2 Database.- 2.1.3 Support.- 2.1.4 Association Rule.- 2.1.5 Problem Statement.- 2.2 Search Space.- 2.3 Splitting Procedure.- 2.4 Enumerating ?-Frequent Attribute Sets (cass).- 2.5 Sequential Enumeration Procedure.- 2.6 Parallel Enumeration Procedure.- 2.6.1 Initial Load Balancing.- 2.6.2 Computing the Starting Sets.- 2.6.3 Enumeration Procedure.- 2.6.4 Dynamic Load Balancing.- 2.7 Generating the Association Rules.- 2.7.1 Sequential Generation.- 2.7.2 Parallel Generation.- 3. Apriori and Other Algorithms.- 3.1 Early Algorithms.- 3.1.1 AIS.- 3.1.2 SETM.- 3.2 The Apriori Algorithms.- 3.2.1 Apriori.- 3.2.2 AprioriTid.- 3.3 Direct Hashing and Pruning.- 3.3.1 Filtering Candidates.- 3.3.2 Database Trimming.- 3.3.3 The DHP Algorithm.- 3.4 Dynamic Set Counting.- 4. Mining for Rules over Attribute Taxonomies.- 4.1 Association Rules over Taxonomies.- 4.2 Problem Statement and Algorithms.- 4.3 Pruning Uninteresting Rules.- 4.3.1 Measure of Interest.- 4.3.2 Rule Pruning Algorithm.- 4.3.3 Attribute Presence-Based Pruning.- 5. Constraint-Based Rule Mining.- 5.1 Boolean Constraints.- 5.1.1 Syntax.- 5.1.2 Semantics.- 5.1.3 Propagation of Boolean Constraints.- 5.2 Prime Implicants.- 5.3 Problem Statement and Algorithms.- 6. Data Partition-Based Rule Mining.- 6.1 Data Partitioning.- 6.1.1 Building a Probabilistic Model.- 6.1.2 Bounding Large Deviations for One cas (Chernoff bounds).- 6.1.3 Bounding Large Deviations for Sets of cass.- 6.2 cas Enumeration with Partitioned Data.- 6.2.1 Data Partitioning.- 6.2.2 Local ?-Frequent cas Generation.- 6.2.3 Global ?-Frequent cas Generation.- 7. Mining for Rules with Categorical and Metric Attributes.- 7.1 Interval Systems and Quantitative Rules.- 7.2 k-Partial Completeness.- 7.3 Pruning Uninteresting Rules.- 7.3.1 Measure of Interest.- 7.3.2 Attribute Presence-Based Pruning.- 7.4 Enumeration Algorithms.- 8. Optimizing Rules with Quantitative Attributes.- 8.1 Solving 1-1-Type Rule Optimization Problems.- 8.1.1 Problem Statement.- 8.1.2 MC\S Problem.- 8.1.3 MS\C Problem.- 8.1.4 MG Problem.- 8.2 Solving d-1-Type Rule Optimization Problems.- 8.3 Solving 1-q-Type Rule Optimization Problems.- 8.3.1 Problem Statement.- 8.3.2 MS\C Problem.- 8.3.3 MG Problem.- 8.4 Solving d-q-Type Rule Optimization Problems.- 8.4.1 Problem Statement.- 8.4.2 Basic Enumeration.- 8.4.3 Enumeration with Pruning.- 8.4.4 Pruning the Instantiation Set.- 9. Beyond Support-Confidence Framework.- 9.1 A Criticism of the Support-Confidence Framework.- 9.2 Conviction.- 9.3 Pruning Conviction-Based Rules.- 9.3.1 Analyzing Conviction.- 9.3.2 Transitivity-Based Pruning.- 9.3.3 Improvement-Based Pruning.- 9.4 One-Step Association Rule Mining.- 9.4.1 Building a Procedure for One-Step Mining.- 9.4.2 Building a Procedure for Improvement-Based Pruning.- 9.5 Correlated Attribute-Set Mining.- 9.5.1 Collective Strength.- 9.5.2 Correlated Attribute-Set Enumeration.- 9.6 Refining Conviction: Association Rule Intensity.- 9.6.1 Measure Construction.- 9.6.2 Properties.- 9.6.3 Relating ?-int(s ? u) to conv(s ? u).- 9.6.4 Mining with the Intensity Measure.- 9.6.5 ?-Intensity Versus Intensity as Defined in [G96].- 10. Search Space Partition-Based Sequential Pattern Mining.- 10.1 Problem Statement.- 10.1.1 Sequences of cass.- 10.1.2 Database.- 10.1.3 Support.- 10.1.4 Problem Statement.- 10.2 Search Space.- 10.3 Splitting the Search Space.- 10.4 Splitting Procedure.- 10.5 Sequence Enumeration.- 10.5.1 Extending the Support Set Notion.- 10.5.2 Join Operations.- 10.5.3 Sequential Enumeration Procedure.- 10.5.4 Parallel Enumeration Procedure.- Appendix 1. Chernoff Bounds.- Appendix 2. Partitioning in Figure 10.5: Beyond 3rd Power.- Appendix 3. Partitioning in Figure 10.6: Beyond 3rd Power.- References.

174 citations

Journal ArticleDOI
TL;DR: This study presents a systematic literature survey regarding the computational techniques, models and algorithms for mining opinion components from unstructured reviews.

118 citations