scispace - formally typeset
Search or ask a question
Book ChapterDOI

Information Extraction and Sentiment Analysis to Gain Insight into the COVID-19 Crisis

About: The article was published on 2022-01-01. It has received 8 citations till now. The article focuses on the topics: Sentiment analysis & Information extraction.
Citations
More filters
Book ChapterDOI
TL;DR: The goal of this chapter is to review the literature on artificial intelligence and machine learning algorithms for detecting a person's mental health by utilizing patient health records and explains the use of artificial intelligence in curing and monitoring a patient with mental illness through telemedicine.
Abstract: Artificial intelligence is a huge part of the healthcare industry, having applications and uses in oncology, cardiology, dermatology, and many other fields. Another area where AI is constantly attempting to improve is mental healthcare by integrating machine learning to evaluate data generated by mobile and IoT devices. AI aids in the diagnosis and tailoring of therapy for mentally ill individuals at various stages. The artificial intelligence and machine learning methods utilize electronic health records, mood rating scales, brain images, mobile devices monitoring data in prediction, classification, and grouping of mental health issues, mainly psychiatric illness, suicide attempts, schizophrenia, and depression. The goal of this chapter is to review the literature on artificial intelligence and machine learning algorithms for detecting a person's mental health by utilizing patient health records. In addition, the chapter explains the use of artificial intelligence in curing and monitoring a patient with mental illness through telemedicine.

6 citations

Journal ArticleDOI
TL;DR: In this article , an incremental topic model with word embedding (ITMWE) is proposed that processes large text data in an incremental environment and extracts latent topics that best describe the document collections.
Abstract: The usage of various software applications has grown tremendously due to the onset of Industry 4.0, giving rise to the accumulation of all forms of data. The scientific, biological, and social media text collections demand efficient machine learning methods for data interpretability, which organizations need in decision-making of all sorts. The topic models can be applied in text mining of biomedical articles, scientific articles, Twitter data, and blog posts. This paper analyzes and provides a comparison of the performance of Latent Dirichlet Allocation (LDA), Dynamic Topic Model (DTM), and Embedded Topic Model (ETM) techniques. An incremental topic model with word embedding (ITMWE) is proposed that processes large text data in an incremental environment and extracts latent topics that best describe the document collections. Experiments in both offline and online settings on large real-world document collections such as CORD-19, NIPS papers, and Tweet datasets show that, while LDA and DTM is a good model for discovering word-level topics, ITMWE discovers better document-level topic groups more efficiently in a dynamic environment, which is crucial in text mining applications.

2 citations

Book ChapterDOI
Faculty1
01 Jan 2022
TL;DR: In this article , a preprocessing framework combines all the best preprocessing techniques for both the languages Hindi and English for text classification, and experiments on Tweets, Movie Reviews, and Product Reviews datasets reveal that selecting optimal pairings of preprocessing tasks rather than allowing or deleting them fully can significantly improve classification accuracy.
Abstract: Good quality text data gives a better interpretation of results and provides efficient learning models. The real-world text data comes with many irregularities and errors that need good preprocessing strategies to get quality text corpus. The preprocessing steps before performing text classification of documents involve various steps of transformation and manipulation of the documents. The proposed preprocessing framework combines all the best preprocessing techniques for both the languages Hindi and English. This study will look at the significance of preprocessing methods on text classification from different perspectives including classification accuracy, text language, and feature selection. Experiments on Tweets, Movie Reviews, and Product Reviews datasets reveal that, depending on the domain and language under consideration, selecting optimal pairings of preprocessing tasks rather than allowing or deleting them fully can significantly improve classification accuracy. Some preprocessing step is useful in Hindi language texts but not in English language texts, according to our findings.

1 citations

Journal ArticleDOI
TL;DR: In this paper , a framework is proposed based on extracting coherent aspects from the reviews and applying the extractive summarization method to generate summaries, providing insights into the reviews of tourist attractions by using aspect-based sentiment analysis.
Abstract: Reading massive amounts of user-generated text and pulling out the relevant aspects and opinions is a complicated process. Summaries, on the other hand, help busy people who only have a little bit of time to read get the gist of the information quickly. Text summarization is the process of taking the original text and making a shorter version of it that still has all of its informational value and main idea. Humans have a hard time summarizing long texts by hand. Different ways to summarize a text can be put into groups based on the more general techniques of extractive and abstractive summarization. The research paper discusses the need for generating aspect-based summaries and Sentiment analysis. A framework is proposed based on extracting coherent aspects from the reviews and applying the extractive summarization method to generate summaries. In addition, providing insights into the reviews of tourist attractions by using aspect-based sentiment analysis. The results are evaluated using crowdsourcing, Fairsumm, and Centroid method. The crowdsourcing method gives the best result on aspect-based summaries.

1 citations

Book ChapterDOI
01 Jan 2022
TL;DR: In this article , a real-time framework for the assessment of depression and suicidal tendencies among people due to covid-19 was presented, which gives a better alternate option to reduce the suicidal tendency in covid time with retweeting and other alternate real-to-time ways.
Abstract: The assessment of depression and suicidal tendencies among people due to covid-19 was less explored. The paper presents the real-time framework for the assessment of depression in covid pandemic. This approach gives a better alternate option to reduce the suicidal tendency in covid time with retweeting and other alternate real-time ways. Hence, the main objective of the present work is, to develop a real time frame-work to analyse sentiment and depression in people due to covid. The experimental investigation is carried out based on real time streamed tweets from twitter adopting lexicon and machine learning (ML) approach. Linear regression, K-nearest neighbor (KNN), Naive Bayes models are trained and tested with 1000 tweets to ascertain the accuracy for the sentiment’s distribution. Comparatively, the decision tree (98.75%) and Naive Bayes (80.33%) have shown better accuracy with the visualisation of data to draw any inferences from sentiments using word cloud.
References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

22 May 2010
TL;DR: This work describes a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion, and implements several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation in a way that makes them completely independent of the training corpus size.
Abstract: Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). We identify gap in existing VSM implementations, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. In this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.

3,965 citations

Proceedings Article
16 May 2014
TL;DR: Interestingly, using the authors' parsimonious rule-based model to assess the sentiment of tweets, it is found that VADER outperforms individual human raters, and generalizes more favorably across contexts than any of their benchmarks.
Abstract: The inherent nature of social media content poses serious challenges to practical applications of sentiment analysis. We present VADER, a simple rule-based model for general sentiment analysis, and compare its effectiveness to eleven typical state-of-practice benchmarks including LIWC, ANEW, the General Inquirer, SentiWordNet, and machine learning oriented techniques relying on Naive Bayes, Maximum Entropy, and Support Vector Machine (SVM) algorithms. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Interestingly, using our parsimonious rule-based model to assess the sentiment of tweets, we find that VADER outperforms individual human raters (F1 Classification Accuracy = 0.96 and 0.84, respectively), and generalizes more favorably across contexts than any of our benchmarks.

3,299 citations

Proceedings ArticleDOI
25 Jun 2006
TL;DR: A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections, and dynamic topic models provide a qualitative window into the contents of a large document collection.
Abstract: A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric wavelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative window into the contents of a large document collection. The models are demonstrated by analyzing the OCR'ed archives of the journal Science from 1880 through 2000.

2,410 citations