scispace - formally typeset
Search or ask a question
Book ChapterDOI

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

01 Jan 2021-pp 385-396
TL;DR: The main objective is to review text mining techniques, application areas, and existing issues.
Abstract: The discovery of knowledge from large-scale text data or semi-structured data is very difficult. In text mining, useful information is extracted out of such large text corpus which fulfills a user current information need. This process is being exploited by various organizations for quality improvement, business need, and understanding user behavior. The text available in unstructured and semi-structured form can come through sources such as medical, financial, market, scientific, and others documents. Text mining applies quantitative approach to analyze massive amount of textual data and tries to solve information overload problem. The main objective is to review text mining techniques, application areas, and existing issues.
Citations
More filters
Book ChapterDOI
01 Jan 2021
TL;DR: In this article, N-gram models are discussed and evaluated using Good Turing Estimation, perplexity measure and type-to-token ratio to predict the next word when the user provides input.
Abstract: The prediction of next word, letter or phrase for the user, while she is typing, is a really valuable tool for improving user experience. The users are communicating, writing reviews and expressing their opinion on such platforms frequently and many times while moving. It has become necessary to provide the user with an application that can reduce typing effort and spelling errors when they have limited time. The text data is getting larger in size due to the extensive use of all kinds of social media platforms and so implementation of text prediction application is difficult considering the size of text data to be processed for language modeling. This research paper’s primary objective is processing large text corpus and implementing a probabilistic model like N-grams to predict the next word when the user provides input. In this exploratory research, n-gram models are discussed and evaluated using Good Turing Estimation, perplexity measure and type-to-token ratio.

14 citations

Book ChapterDOI
TL;DR: The goal of this chapter is to review the literature on artificial intelligence and machine learning algorithms for detecting a person's mental health by utilizing patient health records and explains the use of artificial intelligence in curing and monitoring a patient with mental illness through telemedicine.
Abstract: Artificial intelligence is a huge part of the healthcare industry, having applications and uses in oncology, cardiology, dermatology, and many other fields. Another area where AI is constantly attempting to improve is mental healthcare by integrating machine learning to evaluate data generated by mobile and IoT devices. AI aids in the diagnosis and tailoring of therapy for mentally ill individuals at various stages. The artificial intelligence and machine learning methods utilize electronic health records, mood rating scales, brain images, mobile devices monitoring data in prediction, classification, and grouping of mental health issues, mainly psychiatric illness, suicide attempts, schizophrenia, and depression. The goal of this chapter is to review the literature on artificial intelligence and machine learning algorithms for detecting a person's mental health by utilizing patient health records. In addition, the chapter explains the use of artificial intelligence in curing and monitoring a patient with mental illness through telemedicine.

6 citations

Journal ArticleDOI
TL;DR: Specific applications related to the extraction and classification of social media data using novel SA techniques are presented and quantified, with an emphasis on those used for the identification of mental health degradation during the COVID-19 pandemic.
Abstract: For decades, researchers have experimented with the possibility that machines can equal human linguistic capabilities. Recently, advances in the field of natural language processing (NLP) as well as a substantial increase in available naturally occurring linguistic data on social media platforms have made more advanced methodologies such as sentiment analysis (SA) gain substantial momentum on contemporary applications. This document compiles what the authors consider to be some of the most important concepts related to SA, as well as techniques and processes necessary for the various stages of its implementation. Furthermore, specific applications related to the extraction and classification of social media data using novel SA techniques are presented and quantified, with an emphasis on those used for the identification of mental health degradation during the COVID-19 pandemic. Finally, the authors present several conclusions highlighting the most prominent benefits and drawbacks of the methods discussed, followed by a brief discussion of possible future applications of certain methods of interest.

6 citations

Journal ArticleDOI
TL;DR: In this article , the authors found that high levels of social intelligence are required for effective engagement, and they set out to find the association between employee engagement and social intelligence by conducting a statistical analysis.
Abstract: Recognizing that high levels of social intelligence are required for effective engagement, the authors set out to find the association between employee engagement and social intelligence. Specifically, the goal of this study was to find the explanatory value of social intelligence constructs for employee engagement in a sample of employees by conducting a statistical analysis. The final research included 150 male and 50 female professionals who were selected from FMCG sectors. A questionnaire was used to gather socio-demographic evidence; the Utrecht engagement scale and the Tromso social intelligence scale in the Indian cultural context were used to obtain professional and job information. The findings revealed that employees with high levels of social intelligence scores performed well on engagement measures, with social skills being the most significant predictor of engagement. The findings of this study have substantial practical significance for the development of training and intervention activities targeted at improving employees' performance on the job, among other things.

4 citations

References
More filters
Book
28 May 1999
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Abstract: Statistical approaches to processing natural language text have become dominant in recent years This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear The book contains all the theory and algorithms needed for building NLP tools It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications

9,295 citations

Journal ArticleDOI
TL;DR: This paper is aimed to demonstrate a close-up view about Big Data, including Big Data applications, Big Data opportunities and challenges, as well as the state-of-the-art techniques and technologies currently adopt to deal with the Big Data problems.

2,516 citations

Journal ArticleDOI
TL;DR: How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes.
Abstract: How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their ...

2,005 citations

Journal ArticleDOI
TL;DR: A general framework for hierarchical, agglomerative clustering algorithms is discussed in this article, which opens up the prospect of much improvement on current, widely-used clustering methods.
Abstract: It has often been asserted that since hierarchical clustering algorithms require pairwise interobject proximities, the complexity of these clustering procedures is at least O(N 2 ). Recent work has disproved this by incorporating efficient nearest neighbour searching algorithms into the clustering algorithms. A general framework for hierarchical, agglomerative clustering algorithms is discussed here, which opens up the prospect of much improvement on current, widely-used algorithms. This 'progress report' details new algorithmic approaches in this area, and reviews recent results.

988 citations

Journal ArticleDOI
TL;DR: The major challenge of biomedical text mining over the next 5-10 years will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.
Abstract: The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5–10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

782 citations