scispace - formally typeset
Search or ask a question

Showing papers on "Latent Dirichlet allocation published in 2021"


Journal ArticleDOI
TL;DR: The evolution of research on AI in business over time is presented, highlighting seminal works in the field, and the leading publication venues are highlighted, and a research agenda is proposed to guide the directions of future AI research in business addressing the identified trends and challenges.

113 citations


Journal ArticleDOI
TL;DR: A novel method for aspect based sentiment analysis, Sentence Segment LDA (SS-LDA) is proposed, which is a novel adaptation of LDA algorithm for product aspect extraction and experimental results reveal that SS- LDA is quite competitive in extracting products aspects.
Abstract: With the widespread use of social networks, blogs, forums and e-commerce web sites, the volume of user generated textual data is growing exponentially. User opinions in product reviews or in other textual data are crucial for manufacturers, retailers and providers of the products and services. Therefore, sentiment analysis and opinion mining have become important research areas. In user reviews mining, topic modeling based approaches and Latent Dirichlet Allocation (LDA) are significant methods that are used in extracting product aspects in aspect based sentiment analysis. However, LDA cannot be directly applied on user reviews and on other short texts because of data sparsity problem and lack of co-occurrence patterns. Several studies have been published for the adaptation of LDA for short texts. In this study, a novel method for aspect based sentiment analysis, Sentence Segment LDA (SS-LDA) is proposed. SS-LDA is a novel adaptation of LDA algorithm for product aspect extraction. The experimental results reveal that SS-LDA is quite competitive in extracting products aspects.

73 citations


Journal ArticleDOI
TL;DR: In this paper, a real-time monitoring framework is proposed for traffic accident detection and condition analysis using ontology and latent Dirichlet allocation (OLDA) and bidirectional long short-term memory (Bi-LSTM).

72 citations


Journal ArticleDOI
TL;DR: 10 topics were identified in which the main security issues are malware, cybersecurity attacks, data storing vulnerabilities, the use of testing software in IoT, and possible leaks due to the lack of user experience.

61 citations


Journal ArticleDOI
TL;DR: A new measure of innovation is developed using the text of analyst reports of S&P 500 firms to give a useful description of innovation by firms with and without patenting and R&...
Abstract: We develop a new measure of innovation using the text of analyst reports of S&P 500 firms. Our text-based measure gives a useful description of innovation by firms with and without patenting and R&...

59 citations


Journal ArticleDOI
TL;DR: The background and advancement of topic modeling techniques can be found in this paper, where the authors introduce the preliminaries of the topic modelling techniques and review its extensions and variations, such as hierarchical topic modeling over various domains, hierarchical topic modelling, word embedded topic models, and topic models in multilingual perspectives.
Abstract: We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

58 citations


Posted ContentDOI
TL;DR: This narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods, and compares the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users.
Abstract: Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

48 citations


Journal ArticleDOI
TL;DR: The proposed unsupervised framework provides an effective and efficient data mining solution to facilitating deep and comprehensive understanding on drivers’ behavioral characteristics, which will benefit the development of AVs and ADASs.

46 citations


Journal ArticleDOI
TL;DR: It is found that research on the theory of blockchain, blockchain trading systems, blockchain system structure, and intelligent financial systems based on blockchain should be prioritized to realize the full benefits of this technology.

38 citations


Journal ArticleDOI
TL;DR: The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect and can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.
Abstract: The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is proposed. A definition of a word bag based on sentiment word co-occurrence is proposed. The co-occurrence of emotional words takes full account of different short texts. Then, the short texts of a microblog are endowed with emotional polarity. Furthermore, the knowledge pairs of topic special words and topic relation words are extracted and inserted into the LDA model for clustering. Thus, semantic information can be found more accurately. Then, the hidden n topics and Top30 special words set of each topic are extracted from the knowledge pair set. Finally, via LDA topic model primary clustering, a Top30 topic special words set is obtained that is clustered by K-means secondary clustering. The clustering center is optimized iteratively. Comparing with JST, LSM, LTM and ELDA, SKP-LDA performs better in terms of Accuracy, Precision, Recall and F-measure. The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect. It can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.

36 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors explored public attention on social media during the outbreak of COVID-19 by combining data mining and text analysis, the public attention level trend in different stages were presented.

Journal ArticleDOI
TL;DR: This study aimed to investigate the emerging trends in the e-learning field by implementing a topic modeling analysis based on latent Dirichlet allocation on 41,925 peer-reviewed journal articles published between 2000 and 2019, which revealed 16 topics reflecting emerging trends and developments in the field.
Abstract: E-learning studies are becoming very important today as they provide alternatives and support to all types of teaching and learning programs The effect of the COVID-19 pandemic on educational systems has further increased the significance of e-learning Accordingly, gaining a full understanding of the general topics and trends in e-learning studies is critical for a deeper comprehension of the field There are many studies that provide such a picture of the e-learning field, but the limitation is that they do not examine the field as a whole This study aimed to investigate the emerging trends in the e-learning field by implementing a topic modeling analysis based on latent Dirichlet allocation (LDA) on 41,925 peer-reviewed journal articles published between 2000 and 2019 The analysis revealed 16 topics reflecting emerging trends and developments in the e-learning field Among these, the topics “MOOC,” “learning assessment,” and “e-learning systems” were found to be key topics in the field, with a consistently high volume In addition, the topics of “learning algorithms,” “learning factors,” and “adaptive learning” were observed to have the highest overall acceleration, with the first two identified as having a higher acceleration in recent years Going by these results, it is concluded that the next decade of e-learning studies will focus on learning factors and algorithms, which will possibly create a baseline for more individualized and adaptive mobile platforms In other words, after a certain maturity level is reached by better understanding the learning process through these identified learning factors and algorithms, the next generation of e-learning systems will be built on individualized and adaptive learning environments These insights could be useful for e-learning communities to improve their research efforts and their applications in the field accordingly

Journal ArticleDOI
TL;DR: This paper aims to experiment with BERTopic using different Pre-Trained Arabic Language Models as embeddings, and compare its results against LDA and NMF techniques, using Normalized Pointwise Mutual Information (NPMI) measure to evaluate the results of topic modeling techniques.


Journal ArticleDOI
TL;DR: A empirically evaluated and compared seven state-of-the-art meta-heuristics and three alternative surrogate metrics to solve the problem of identifying duplicate bug reports with LDA and indicated that meta- heuristics are mostly comparable to one another and the choice of the surrogate metric impacts the quality of the generated topics and the tuning overhead.
Abstract: Context:Latent Dirichlet Allocation (LDA) has been successfully used in the literature to extract topics from software documents and support developers in various software engineering tasks. While LDA has been mostly used with default settings, previous studies showed that default hyperparameter values generate sub-optimal topics from software documents. Objective: Recent studies applied meta-heuristic search (mostly evolutionary algorithms) to configure LDA in an unsupervised and automated fashion. However, previous work advocated for different meta-heuristics and surrogate metrics to optimize. The objective of this paper is to shed light on the influence of these two factors when tuning LDA for SE tasks. Method:We empirically evaluated and compared seven state-of-the-art meta-heuristics and three alternative surrogate metrics (i.e., fitness functions) to solve the problem of identifying duplicate bug reports with LDA. The benchmark consists of ten real-world and open-source projects from the Bench4BL dataset. Results:Our results indicate that (1) meta-heuristics are mostly comparable to one another (except for random search and CMA-ES), and (2) the choice of the surrogate metric impacts the quality of the generated topics and the tuning overhead. Furthermore, calibrating LDA helps identify twice as many duplicates than untuned LDA when inspecting the top five past similar reports. Conclusion:No meta-heuristic and/or fitness function outperforms all the others, as advocated in prior studies. However, we can make recommendations for some combinations of meta-heuristics and fitness functions over others for practical use. Future work should focus on improving the surrogate metrics used to calibrate/tune LDA in an unsupervised fashion.

Journal ArticleDOI
TL;DR: A framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach, which can be integrated into a platform with GUI for further automation.
Abstract: The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the particular topic. In this paper, we develop a framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-IDF and K-means for sentiment classification among affinitive and hierarchical clustering. Latent Dirichlet Allocation, a pipeline of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution. This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded in the framework used in achieving the above output, which can be integrated into a platform with GUI for further automation.

Journal ArticleDOI
01 Nov 2021
TL;DR: This work proposes an enhancement to the user-based approaches, which are extensively used in the recommender system literature, that combines Wikipedia data and browsing history into the recommendation algorithm and generates topics by using the Latent Dirichlet Allocation models on the Wikipedia data.
Abstract: Personalizing user experience in recommender systems is possible when there is sufficient information about the user. But when new users join the system, the unavailability of information about these users, referred to as cold-start, inhibits the functionality of a recommender system. We propose an enhancement to the user-based approaches, which are extensively used in the recommender system literature. Our approach combines Wikipedia data and browsing history into the recommendation algorithm. Specifically, we generate topics by using the Latent Dirichlet Allocation (LDA) models on the Wikipedia data, and then use the topics on user browsing history to extract user preferences. Our evaluation employs five approaches and tests their performance in terms of prediction and classification accuracy. We conduct experiments in two domains (movies and restaurants), to gather user ratings and their browsing history for evaluation. Results from both experiments favor our proposed enhancement.

Journal ArticleDOI
TL;DR: For example, this paper found that neural network-based machine learning methods, in particular pre-trained versions, offer the most accurate predictions, while topic models such as Latent Dirichlet Allocation offer deeper diagnostics.

Journal ArticleDOI
TL;DR: This study aims to advance the existing research on geoparks by incorporating machine learning models in the analysis of online reviews to provide valuable suggestions for managers in increasing their understanding of the psychological cognition of tourists and evaluating the status of geoparks.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an automated keyword filtering method to identify product attributes from online customer reviews based on latent Dirichlet allocation, which improves the preprocessing for latent-Dirichlet assignment by conducting automated filtering to remove the noise keywords that are not related to the product.
Abstract: Identifying product attributes from the perspective of a customer is essential to measure the satisfaction, importance, and Kano category of each product attribute for product design. This article proposes automated keyword filtering to identify product attributes from online customer reviews based on latent Dirichlet allocation. The preprocessing for latent Dirichlet allocation is important because it affects the results of topic modeling; however, previous research performed latent Dirichlet allocation either without removing noise keywords or by manually eliminating them. The proposed method improves the preprocessing for latent Dirichlet allocation by conducting automated filtering to remove the noise keywords that are not related to the product. A case study of Android smartphones is performed to validate the proposed method. The performance of the latent Dirichlet allocation by the proposed method is compared to that of a previous method, and according to the latent Dirichlet allocation results, the former exhibits a higher performance than the latter. [DOI: 10.1115/1.4048960]

Journal ArticleDOI
TL;DR: The proposed topic model can be used to infer the destination of unlinked trips, analyze travel patterns, and passenger clustering, and is tested on Guangzhou Metro smart card data, in which the ground-truth is available.
Abstract: Inferring trip destination in smart card data with only tap-in control is an important application. Most existing methods estimate trip destinations based on the continuity of trip chains, while the destinations of isolated/unlinked trips cannot be properly handled. We address this problem with a probabilistic topic model. A three-dimensional latent dirichlet allocation model is developed to extract latent topics of departure time, origin, and destination among the population; each passenger’s travel behavior is characterized by a latent topic distribution defined on a three-dimensional simplex. Given the origin station and departure time, the most likely destination can be obtained by statistical inference. Furthermore, we propose to represent stations by their rank of visiting frequency, which transforms divergent spatial patterns into similar behavioral regularities. The proposed destination estimation framework is tested on Guangzhou Metro smart card data, in which the ground-truth is available. Compared with benchmark models, the topic model not only shows increased accuracy but also captures essential latent patterns in passengers’ travel behavior. The proposed topic model can be used to infer the destination of unlinked trips, analyze travel patterns, and passenger clustering.

Book ChapterDOI
08 Feb 2021
TL;DR: The team introduced an approach to combine topical distributions from Latent Dirichlet Allocation (LDA) with contextualized representations from XLNet, and compared the method with existing baselines to show that XLNet \(+\) Topic Distributions outperforms other approaches by attaining an F1-score of 0.967.
Abstract: With the ease of access to information, and its rapid dissemination over the internet (both velocity and volume), it has become challenging to filter out truthful information from fake ones. The research community is now faced with the task of automatic detection of fake news, which carries real-world socio-political impact. One such research contribution came in the form of the Constraint@AAA12021 Shared Task on COVID19 Fake News Detection in English. In this paper, we shed light on a novel method we proposed as a part of this shared task. Our team introduced an approach to combine topical distributions from Latent Dirichlet Allocation (LDA) with contextualized representations from XLNet. We also compared our method with existing baselines to show that XLNet \(+\) Topic Distributions outperforms other approaches by attaining an F1-score of 0.967.

Journal ArticleDOI
TL;DR: This work presents hybrid topic modeling techniques by integrating traditional topic models with visualization procedures to aid in the visualization of topic clouds and health tendencies in the document collection and believes proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), and Visual Probabilistic Latent Schematic Indexing (VPLSI).
Abstract: Social media is a great source to search health-related topics for envisages solutions towards healthcare. Topic models originated from Natural Language Processing that is receiving much attention in healthcare areas because of interpretability and its decision making, which motivated us to develop visual topic models. Topic models are used for the extraction of health topics for analyzing discriminative and coherent latent features of tweet documents in healthcare applications. Discovering the number of topics in topic models is an important issue. Sometimes, users enable an incorrect number of topics in traditional topic models, which leads to poor results in health data clustering. In such cases, proper visualizations are essential to extract information for identifying cluster trends. To aid in the visualization of topic clouds and health tendencies in the document collection, we present hybrid topic modeling techniques by integrating traditional topic models with visualization procedures. We believe proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual intJNon-negative Matrix Factorization (VintJNMF), and Visual Probabilistic Latent Schematic Indexing (VPLSI) are promising methods for extracting tendency of health topics from various sources in healthcare data clustering. Standard and benchmark social health datasets are used in an experimental study to demonstrate the efficiency of proposed models concerning clustering accuracy (CA), Normalized Mutual Information (NMI), precision (P), recall (R), F-Score (F) measures and computational complexities. VNMF visual model performs significantly at an increased rate of 32.4% under cosine based metric in the display of visual clusters and an increased rate of 35–40% in performance measures compared to other visual methods on different number of health topics.

Proceedings ArticleDOI
25 Mar 2021
TL;DR: In this article, Latent Semantic Analysis (LSA) using truncated SVD is used to extract all the relevant topics from the text. But, this method is not suitable for the task of document summarization.
Abstract: Document summarization is one such task of the natural language processing which deals with the long textual data to make its concise and fluent summaries that contains all of document relevant information. The Branch of NLP that deals with it, is automatic text summarizer. Automatic text summarizer does the task of converting the long textual document into short fluent summaries. There are generally two ways of summarizing text using automatic text summarizer, first is using extractive text summarizer and another abstractive text summarizer. This paper has demonstrated an experiment in contrast with the extractive text summarizer for summarizing the text. On the other hand topic modelling is a NLP task that extracts the relevant topic from the textual document. One such method is Latent semantic Analysis (LSA) using truncated SVD which extracts all the relevant topics from the text. This paper has demonstrated the experiment in which the proposed research work will be summarizing the long textual document using LSA topic modelling along with TFIDF keyword extractor for each sentence in a text document and also using BERT encoder model for encoding the sentences from textual document in order to retrieve the positional embedding of topics word vectors. The algorithm proposed algorithm in this paper is able to achieve the score greater than that of text summarization using Latent Dirichlet Allocation (LDA) topic modelling.

Journal ArticleDOI
TL;DR: The combination of natural language processing in machine learning to classify research topics and traditional literature review to investigate article details greatly improved the objectivity and scientificity of the study and laid a solid foundation for further research.

Journal ArticleDOI
TL;DR: A novel approach NAACL (Neural Attentive model for cross-domain Aspect-level sentiment CLassification), which leverages the benefits of the supervised deep neural network as well as the unsupervised probabilistic generative model to strengthen the representation learning is proposed.
Abstract: This work takes the lead to study the aspect-level sentiment classification in the domain adaptation scenario . Given a document of any domains, the model needs to figure out the sentiments with respect to fine-grained aspects in the documents. Two main challenges exist in this problem. One is to build a robust document modeling across domains; the other is to mine the domain-specific aspects and make use of the sentiment lexicon. In this paper, we propose a novel approach Neural Attentive model for cross-domain Aspect-level sentiment CLassification (NAACL), which leverages the benefits of the supervised deep neural network as well as the unsupervised probabilistic generative model to strengthen the representation learning. NAACL jointly learns two tasks: (i) a domain classifier, working on documents in both the source and target domains to recognize the domain information of input texts and transfer knowledge from the source domain to the target domain. In particular, a weakly supervised Latent Dirichlet Allocation model (wsLDA) is proposed to learn the domain-specific aspect and sentiment lexicon representations that are then used to calculate the aspect/lexicon-aware document representations via a multi-view attention mechanism; (ii) an aspect-level sentiment classifier, sharing the document modeling with the domain classifier. It makes use of the domain classification results and the aspect/sentiment-aware document representations to classify the aspect-level sentiment of the document in domain adaptation scenario. NAACL is evaluated on both English and Chinese datasets with the out-of-domain as well as in-domain setups. Quantitatively, the experiments demonstrate that NAACL has robust superiority over the compared methods in terms of classification accuracy and F1 score. The qualitative evaluation also shows that the proposed model is capable of reasonably paying attention to those words that are important to judge the sentiment polarity of the input text given an aspect.

Journal ArticleDOI
01 Feb 2021-Cities
TL;DR: This study proposed a framework for social media big data mining and data analytics using Twitter and demonstrated the functionalities of the framework on a case study using Natural Language Processing and Machine Learning techniques to mine, clean, process, and validate the data.

Journal ArticleDOI
Fangyuan Zhao1, Xuebin Ren1, Shusen Yang1, Han Qing1, Peng Zhao1, Xinyu Yang1 
TL;DR: Wang et al. as discussed by the authors proposed a centralized privacy-preserving LDA (HDP-LDA) algorithm to prevent data inference from the intermediate statistics in the collapsed Gibbs sampling (CGS) training.
Abstract: Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as a fundamental tool for text analysis in various applications. However, the LDA model as well as the training process of LDA may expose the text information in the training data, thus bringing significant privacy concerns. To address the privacy issue in LDA, we systematically investigate the privacy protection of the main-stream LDA training algorithm based on Collapsed Gibbs Sampling (CGS) and propose several differentially private LDA algorithms for typical training scenarios. In particular, we present the first theoretical analysis on the inherent differential privacy guarantee of CGS based LDA training and further propose a centralized privacy-preserving algorithm (HDP-LDA) that can prevent data inference from the intermediate statistics in the CGS training. Also, we propose a locally private LDA training algorithm ( LP-LDA ) on crowdsourced data to provide local differential privacy for individual data contributors. Furthermore, we extend LP-LDA to an online version as OLP-LDA to achieve LDA training on locally private mini-batches in a streaming setting. Extensive analysis and experiment results validate both the effectiveness and efficiency of our proposed privacy-preserving LDA training algorithms.

Journal ArticleDOI
TL;DR: In this article, a topic document sentence (TDS) model is proposed based on joint sentiment topic (JST) and latent Dirichlet allocation (LDA) topic modeling techniques to discover sentiment polarity not only at the document level but also at the word level.
Abstract: Customer reviews on the Internet reflect users’ sentiments about the product, service, and social events. As sentiments can be divided into positive, negative, and neutral forms, sentiment analysis processes identify the polarity of information in the source materials toward an entity. Most studies have focused on document-level sentiment classification. In this study, we apply an unsupervised machine learning approach to discover sentiment polarity not only at the document level but also at the word level. The proposed topic document sentence (TDS) model is based on joint sentiment topic (JST) and latent Dirichlet allocation (LDA) topic modeling techniques. The IMDB dataset, comprising user reviews, was used for data analysis. First, we applied the LDA model to discover topics from the reviews; then, the TDS model was implemented to identify the polarity of the sentiment from topic to document, and from document to word levels. The LDAvis tool was used for data visualization. The experimental results show that the analysis not only obtained good topic partitioning results, but also achieved high sentiment analysis accuracy in document- and word-level sentiment classifications.

Journal ArticleDOI
TL;DR: The research findings revealed the appearance of conflicting topics throughout the two Coronavirus pandemic periods and the expectations and interests of all individuals regarding the various topics were well represented.
Abstract: The incessant Coronavirus pandemic has had a detrimental impact on nations across the globe The essence of this research is to demystify the social media's sentiments regarding Coronavirus The paper specifically focuses on twitter and extracts the most discussed topics during and after the first wave of the Coronavirus pandemic The extraction was based on a dataset of English tweets pertinent to COVID-19 The research study focuses on two main periods with the first period starting from March 01,2020 to April 30, 2020 and the second period starting from September 01,2020 to October 31, 2020 The Latent Dirichlet Allocation (LDA) was adopted for topics extraction whereas a lexicon based approach was adopted for sentiment analysis In regards to implementation, the paper utilized spark platform with Python to enhance speed and efficiency of analyzing and processing large-scale social data The research findings revealed the appearance of conflicting topics throughout the two Coronavirus pandemic periods Besides, the expectations and interests of all individuals regarding the various topics were well represented © 2021 All rights reserved