Showing papers on "Latent Dirichlet allocation published in 2021"

PDF

Open Access

Journal Article•DOI•

Artificial intelligence in business: state of the art and future research agenda

[...]

Sandra Loureiro¹, João Guerreiro¹, Iis P. Tussyadiah²•Institutions (2)

ISCTE – University Institute of Lisbon¹, University of Surrey²

01 May 2021-Journal of Business Research

TL;DR: The evolution of research on AI in business over time is presented, highlighting seminal works in the field, and the leading publication venues are highlighted, and a research agenda is proposed to guide the directions of future AI research in business addressing the identified trends and challenges.

...read moreread less

113 citations

Journal Article•DOI•

A new topic modeling based approach for aspect extraction in aspect based sentiment analysis: SS-LDA

[...]

Barış Özyurt¹, M. Ali Akcayol¹•Institutions (1)

Gazi University¹

15 Apr 2021-Expert Systems With Applications

TL;DR: A novel method for aspect based sentiment analysis, Sentence Segment LDA (SS-LDA) is proposed, which is a novel adaptation of LDA algorithm for product aspect extraction and experimental results reveal that SS- LDA is quite competitive in extracting products aspects.

...read moreread less

Abstract: With the widespread use of social networks, blogs, forums and e-commerce web sites, the volume of user generated textual data is growing exponentially. User opinions in product reviews or in other textual data are crucial for manufacturers, retailers and providers of the products and services. Therefore, sentiment analysis and opinion mining have become important research areas. In user reviews mining, topic modeling based approaches and Latent Dirichlet Allocation (LDA) are significant methods that are used in extracting product aspects in aspect based sentiment analysis. However, LDA cannot be directly applied on user reviews and on other short texts because of data sparsity problem and lack of co-occurrence patterns. Several studies have been published for the adaptation of LDA for short texts. In this study, a novel method for aspect based sentiment analysis, Sentence Segment LDA (SS-LDA) is proposed. SS-LDA is a novel adaptation of LDA algorithm for product aspect extraction. The experimental results reveal that SS-LDA is quite competitive in extracting products aspects.

...read moreread less

73 citations

Journal Article•DOI•

Traffic accident detection and condition analysis based on social networking data.

[...]

Farman Ali¹, Amjad Ali², Muhammad Imran³, Rizwan Ali Naqvi¹, Muhammad Hameed Siddiqi, Kyung Sup Kwak⁴ - Show less +2 more•Institutions (4)

Sejong University¹, COMSATS Institute of Information Technology², King Saud University³, Inha University⁴

15 Jan 2021-Accident Analysis & Prevention

TL;DR: In this paper, a real-time monitoring framework is proposed for traffic accident detection and condition analysis using ontology and latent Dirichlet allocation (OLDA) and bidirectional long short-term memory (Bi-LSTM).

...read moreread less

72 citations

Journal Article•DOI•

Using data mining techniques to explore security issues in smart living environments in Twitter

[...]

Jose Ramon Saura¹, Daniel Palacios-Marqués², Domingo Ribeiro-Soriano³•Institutions (3)

King Juan Carlos University¹, Polytechnic University of Valencia², University of Alcalá³

01 Nov 2021-Computer Communications

TL;DR: 10 topics were identified in which the main security issues are malware, cybersecurity attacks, data storing vulnerabilities, the use of testing software in IoT, and possible leaks due to the lack of user experience.

...read moreread less

61 citations

Journal Article•DOI•

A Text-Based Analysis of Corporate Innovation

[...]

Gustaf Bellstam¹, Sanjai Bhagat², J. Anthony Cookson²•Institutions (2)

Facebook¹, University of Colorado Boulder²

01 Jul 2021-Management Science

TL;DR: A new measure of innovation is developed using the text of analyst reports of S&P 500 firms to give a useful description of innovation by firms with and without patenting and R&...

...read moreread less

Abstract: We develop a new measure of innovation using the text of analyst reports of S&P 500 firms. Our text-based measure gives a useful description of innovation by firms with and without patenting and R&...

...read moreread less

59 citations

Journal Article•DOI•

Topic Modeling Using Latent Dirichlet allocation: A Survey

[...]

Uttam Chauhan¹, Apurva Shah•Institutions (1)

Government Engineering College, Sreekrishnapuram¹

17 Sep 2021-ACM Computing Surveys

TL;DR: The background and advancement of topic modeling techniques can be found in this paper, where the authors introduce the preliminaries of the topic modelling techniques and review its extensions and variations, such as hierarchical topic modeling over various domains, hierarchical topic modelling, word embedded topic models, and topic models in multilingual perspectives.

...read moreread less

Abstract: We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

...read moreread less

58 citations

Posted Content•DOI•

Closed- and open-vocabulary approaches to text analysis: A review, quantitative comparison, and recommendations.

[...]

Johannes C. Eichstaedt¹, Margaret L. Kern², David B. Yaden³, H. A. Schwartz⁴, Salvatore Giorgi⁵, Gregory Park⁵, Courtney A. Hagan⁵, Victoria A. Tobolsky⁵, Laura K. Smith⁵, Anneke Buffone⁵, Jonathan Iwry⁵, Martin E. P. Seligman⁵, Lyle H. Ungar⁵ - Show less +9 more•Institutions (5)

Stanford University¹, University of Melbourne², Johns Hopkins University³, Stony Brook University⁴, University of Pennsylvania⁵

01 Aug 2021-Psychological Methods

TL;DR: This narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods, and compares the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users.

...read moreread less

Abstract: Technology now makes it possible to understand efficiently and at large scale how people use language to reveal their everyday thoughts, behaviors, and emotions. Written text has been analyzed through both theory-based, closed-vocabulary methods from the social sciences as well as data-driven, open-vocabulary methods from computer science, but these approaches have not been comprehensively compared. To provide guidance on best practices for automatically analyzing written text, this narrative review and quantitative synthesis compares five predominant closed- and open-vocabulary methods: Linguistic Inquiry and Word Count (LIWC), the General Inquirer, DICTION, Latent Dirichlet Allocation, and Differential Language Analysis. We compare the linguistic features associated with gender, age, and personality across the five methods using an existing dataset of Facebook status updates and self-reported survey data from 65,896 users. Results are fairly consistent across methods. The closed-vocabulary approaches efficiently summarize concepts and are helpful for understanding how people think, with LIWC2015 yielding the strongest, most parsimonious results. Open-vocabulary approaches reveal more specific and concrete patterns across a broad range of content domains, better address ambiguous word senses, and are less prone to misinterpretation, suggesting that they are well-suited for capturing the nuances of everyday psychological processes. We detail several errors that can occur in closed-vocabulary analyses, the impact of sample size, number of words per user and number of topics included in open-vocabulary analyses, and implications of different analytical decisions. We conclude with recommendations for researchers, advocating for a complementary approach that combines closed- and open-vocabulary methods. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

...read moreread less

48 citations

Journal Article•DOI•

Extraction of descriptive driving patterns from driving data using unsupervised algorithms

[...]

Guofa Li¹, Guofa Li², Yaoyu Chen¹, Dongpu Cao², Xingda Qu¹, Bo Cheng³, Keqiang Li³ - Show less +3 more•Institutions (3)

Shenzhen University¹, University of Waterloo², Tsinghua University³

01 Jul 2021-Mechanical Systems and Signal Processing

TL;DR: The proposed unsupervised framework provides an effective and efficient data mining solution to facilitating deep and comprehensive understanding on drivers’ behavioral characteristics, which will benefit the development of AVs and ADASs.

...read moreread less

46 citations

Journal Article•DOI•

Integrating patent analysis into technology roadmapping: A latent dirichlet allocation based technology assessment and roadmapping in the field of Blockchain

[...]

Hao Zhang¹, Tugrul U. Daim², Yunqiu (Peggy) Zhang³•Institutions (3)

Shenzhen University¹, Portland State University², Jilin University³

01 Jun 2021-Technological Forecasting and Social Change

TL;DR: It is found that research on the theory of blockchain, blockchain trading systems, blockchain system structure, and intelligent financial systems based on blockchain should be prioritized to realize the full benefits of this technology.

...read moreread less

38 citations

Journal Article•DOI•

Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm

[...]

Di Wu¹, Rui Xin Yang¹, Chao Shen¹•Institutions (1)

Hebei University of Engineering¹

01 Feb 2021-Journal of Intelligent Information Systems

TL;DR: The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect and can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.

...read moreread less

Abstract: The Latent Dirichlet Allocation (LDA) topic model is a popular research topic in the field of text mining. In this paper, Sentiment Word Co-occurrence and Knowledge Pair Feature Extraction based LDA Short Text Clustering Algorithm (SKP-LDA) is proposed. A definition of a word bag based on sentiment word co-occurrence is proposed. The co-occurrence of emotional words takes full account of different short texts. Then, the short texts of a microblog are endowed with emotional polarity. Furthermore, the knowledge pairs of topic special words and topic relation words are extracted and inserted into the LDA model for clustering. Thus, semantic information can be found more accurately. Then, the hidden n topics and Top30 special words set of each topic are extracted from the knowledge pair set. Finally, via LDA topic model primary clustering, a Top30 topic special words set is obtained that is clustered by K-means secondary clustering. The clustering center is optimized iteratively. Comparing with JST, LSM, LTM and ELDA, SKP-LDA performs better in terms of Accuracy, Precision, Recall and F-measure. The experimental results show that SKP-LDA reveals better semantic analysis ability and emotional topic clustering effect. It can be applied to the field of micro-blog to improve the accuracy of network public opinion analysis effectively.

...read moreread less

36 citations

Journal Article•DOI•

Public Attention about COVID-19 on Social Media: An Investigation based on Data Mining and Text Analysis

[...]

Keke Hou¹, Tingting Hou², Lili Cai¹•Institutions (2)

Sun Yat-sen University¹, Beijing Institute of Foreign Trade²

28 Jan 2021-Personality and Individual Differences

TL;DR: Wang et al. as discussed by the authors explored public attention on social media during the outbreak of COVID-19 by combining data mining and text analysis, the public attention level trend in different stages were presented.

...read moreread less

Journal Article•DOI•

Investigation of Emerging Trends in the E-Learning Field Using Latent Dirichlet Allocation

[...]

Fatih Gurcan¹, Özcan Özyurt¹, Nergiz Ercil Cagiltay²•Institutions (2)

Karadeniz Technical University¹, Atılım University²

14 Jan 2021-The International Review of Research in Open and Distributed Learning

TL;DR: This study aimed to investigate the emerging trends in the e-learning field by implementing a topic modeling analysis based on latent Dirichlet allocation on 41,925 peer-reviewed journal articles published between 2000 and 2019, which revealed 16 topics reflecting emerging trends and developments in the field.

...read moreread less

Abstract: E-learning studies are becoming very important today as they provide alternatives and support to all types of teaching and learning programs The effect of the COVID-19 pandemic on educational systems has further increased the significance of e-learning Accordingly, gaining a full understanding of the general topics and trends in e-learning studies is critical for a deeper comprehension of the field There are many studies that provide such a picture of the e-learning field, but the limitation is that they do not examine the field as a whole This study aimed to investigate the emerging trends in the e-learning field by implementing a topic modeling analysis based on latent Dirichlet allocation (LDA) on 41,925 peer-reviewed journal articles published between 2000 and 2019 The analysis revealed 16 topics reflecting emerging trends and developments in the e-learning field Among these, the topics “MOOC,” “learning assessment,” and “e-learning systems” were found to be key topics in the field, with a consistently high volume In addition, the topics of “learning algorithms,” “learning factors,” and “adaptive learning” were observed to have the highest overall acceleration, with the first two identified as having a higher acceleration in recent years Going by these results, it is concluded that the next decade of e-learning studies will focus on learning factors and algorithms, which will possibly create a baseline for more individualized and adaptive mobile platforms In other words, after a certain maturity level is reached by better understanding the learning process through these identified learning factors and algorithms, the next generation of e-learning systems will be built on individualized and adaptive learning environments These insights could be useful for e-learning communities to improve their research efforts and their applications in the field accordingly

...read moreread less

Journal Article•DOI•

BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique

[...]

Abeer Abuzayed¹, Hend S. Al-Khalifa¹•Institutions (1)

King Saud University¹

01 Jan 2021-Procedia Computer Science

TL;DR: This paper aims to experiment with BERTopic using different Pre-Trained Arabic Language Models as embeddings, and compare its results against LDA and NMF techniques, using Normalized Pointwise Mutual Information (NPMI) measure to evaluate the results of topic modeling techniques.

...read moreread less

Journal Article•DOI•

Mapping the jungle: A bibliometric analysis of research into construal level theory

[...]

Susanne Adler¹, Marko Sarstedt¹, Marko Sarstedt²•Institutions (2)

Otto-von-Guericke University Magdeburg¹, Babeș-Bolyai University²

05 Jul 2021-Psychology & Marketing

Journal Article•DOI•

A Systematic Comparison of Search-Based Approaches for LDA Hyperparameter Tuning

[...]

Annibale Panichella¹•Institutions (1)

Delft University of Technology¹

01 Feb 2021-Information & Software Technology

TL;DR: A empirically evaluated and compared seven state-of-the-art meta-heuristics and three alternative surrogate metrics to solve the problem of identifying duplicate bug reports with LDA and indicated that meta- heuristics are mostly comparable to one another and the choice of the surrogate metric impacts the quality of the generated topics and the tuning overhead.

...read moreread less

Abstract: Context:Latent Dirichlet Allocation (LDA) has been successfully used in the literature to extract topics from software documents and support developers in various software engineering tasks. While LDA has been mostly used with default settings, previous studies showed that default hyperparameter values generate sub-optimal topics from software documents. Objective: Recent studies applied meta-heuristic search (mostly evolutionary algorithms) to configure LDA in an unsupervised and automated fashion. However, previous work advocated for different meta-heuristics and surrogate metrics to optimize. The objective of this paper is to shed light on the influence of these two factors when tuning LDA for SE tasks. Method:We empirically evaluated and compared seven state-of-the-art meta-heuristics and three alternative surrogate metrics (i.e., fitness functions) to solve the problem of identifying duplicate bug reports with LDA. The benchmark consists of ten real-world and open-source projects from the Bench4BL dataset. Results:Our results indicate that (1) meta-heuristics are mostly comparable to one another (except for random search and CMA-ES), and (2) the choice of the surrogate metric impacts the quality of the generated topics and the tuning overhead. Furthermore, calibrating LDA helps identify twice as many duplicates than untuned LDA when inspecting the top five past similar reports. Conclusion:No meta-heuristic and/or fitness function outperforms all the others, as advocated in prior studies. However, we can make recommendations for some combinations of meta-heuristics and fitness functions over others for practical use. Future work should focus on improving the surrogate metrics used to calibrate/tune LDA in an unsupervised fashion.

...read moreread less

Journal Article•DOI•

A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters

[...]

Shalak Mendon¹, Shalak Mendon², Pankaj Dutta², Abhishek Behl², Stefan Lessmann³ - Show less +1 more•Institutions (3)

Wipro¹, Indian Institute of Technology Bombay², Humboldt University of Berlin³

14 Feb 2021-Information Systems Frontiers

TL;DR: A framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach, which can be integrated into a platform with GUI for further automation.

...read moreread less

Abstract: The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the particular topic. In this paper, we develop a framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-IDF and K-means for sentiment classification among affinitive and hierarchical clustering. Latent Dirichlet Allocation, a pipeline of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution. This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded in the framework used in achieving the above output, which can be integrated into a platform with GUI for further automation.

...read moreread less

Journal Article•DOI•

Using topic models with browsing history in hybrid collaborative filtering recommender system: Experiments with user ratings

[...]

Dixon Prem Daniel Rajendran¹, Rangaraja P. Sundarraj¹•Institutions (1)

Indian Institutes of Technology¹

01 Nov 2021

TL;DR: This work proposes an enhancement to the user-based approaches, which are extensively used in the recommender system literature, that combines Wikipedia data and browsing history into the recommendation algorithm and generates topics by using the Latent Dirichlet Allocation models on the Wikipedia data.

...read moreread less

Abstract: Personalizing user experience in recommender systems is possible when there is sufficient information about the user. But when new users join the system, the unavailability of information about these users, referred to as cold-start, inhibits the functionality of a recommender system. We propose an enhancement to the user-based approaches, which are extensively used in the recommender system literature. Our approach combines Wikipedia data and browsing history into the recommendation algorithm. Specifically, we generate topics by using the Latent Dirichlet Allocation (LDA) models on the Wikipedia data, and then use the topics on user browsing history to extract user preferences. Our evaluation employs five approaches and tests their performance in terms of prediction and classification accuracy. We conduct experiments in two domains (movies and restaurants), to gather user ratings and their browsing history for evaluation. Results from both experiments favor our proposed enhancement.

...read moreread less

Journal Article•DOI•

An Empirical Comparison of Machine Learning Methods for Text-based Sentiment Analysis of Online Consumer Reviews

[...]

Huwail J. Alantari¹, Imran S. Currim¹, Yiting Deng², Sameer Singh¹•Institutions (2)

University of California, Irvine¹, University College London²

09 Nov 2021-International Journal of Research in Marketing

TL;DR: For example, this paper found that neural network-based machine learning methods, in particular pre-trained versions, offer the most accurate predictions, while topic models such as Latent Dirichlet Allocation offer deeper diagnostics.

...read moreread less

Journal Article•DOI•

Exploring China's 5A global geoparks through online tourism reviews: A mining model based on machine learning approach

[...]

Yuyan Luo¹, Jinjie He¹, Yu Mou¹, Jun Wang², Tao Liu¹ - Show less +1 more•Institutions (2)

Chengdu University of Technology¹, Sichuan Normal University²

01 Jan 2021-Tourism Management Perspectives

TL;DR: This study aims to advance the existing research on geoparks by incorporating machine learning models in the analysis of online reviews to provide valuable suggestions for managers in increasing their understanding of the psychological cognition of tourists and evaluating the status of geoparks.

...read moreread less

Journal Article•DOI•

Automated Keyword Filtering in Latent Dirichlet Allocation for Identifying Product Attributes From Online Reviews

[...]

Junegak Joung¹, Junegak Joung², Harrison M. Kim²•Institutions (2)

Ulsan National Institute of Science and Technology¹, University of Illinois at Urbana–Champaign²

01 Aug 2021-Journal of Mechanical Design

TL;DR: In this paper, the authors proposed an automated keyword filtering method to identify product attributes from online customer reviews based on latent Dirichlet allocation, which improves the preprocessing for latent-Dirichlet assignment by conducting automated filtering to remove the noise keywords that are not related to the product.

...read moreread less

Abstract: Identifying product attributes from the perspective of a customer is essential to measure the satisfaction, importance, and Kano category of each product attribute for product design. This article proposes automated keyword filtering to identify product attributes from online customer reviews based on latent Dirichlet allocation. The preprocessing for latent Dirichlet allocation is important because it affects the results of topic modeling; however, previous research performed latent Dirichlet allocation either without removing noise keywords or by manually eliminating them. The proposed method improves the preprocessing for latent Dirichlet allocation by conducting automated filtering to remove the noise keywords that are not related to the product. A case study of Android smartphones is performed to validate the proposed method. The performance of the latent Dirichlet allocation by the proposed method is compared to that of a previous method, and according to the latent Dirichlet allocation results, the former exhibits a higher performance than the latter. [DOI: 10.1115/1.4048960]

...read moreread less

Journal Article•DOI•

Probabilistic model for destination inference and travel pattern mining from smart card data

[...]

Zhanhong Cheng¹, Martin Trépanier², Lijun Sun¹•Institutions (2)

McGill University¹, École Polytechnique de Montréal²

01 Aug 2021-Transportation

TL;DR: The proposed topic model can be used to infer the destination of unlinked trips, analyze travel patterns, and passenger clustering, and is tested on Guangzhou Metro smart card data, in which the ground-truth is available.

...read moreread less

Abstract: Inferring trip destination in smart card data with only tap-in control is an important application. Most existing methods estimate trip destinations based on the continuity of trip chains, while the destinations of isolated/unlinked trips cannot be properly handled. We address this problem with a probabilistic topic model. A three-dimensional latent dirichlet allocation model is developed to extract latent topics of departure time, origin, and destination among the population; each passenger’s travel behavior is characterized by a latent topic distribution defined on a three-dimensional simplex. Given the origin station and departure time, the most likely destination can be obtained by statistical inference. Furthermore, we propose to represent stations by their rank of visiting frequency, which transforms divergent spatial patterns into similar behavioral regularities. The proposed destination estimation framework is tested on Guangzhou Metro smart card data, in which the ground-truth is available. Compared with benchmark models, the topic model not only shows increased accuracy but also captures essential latent patterns in passengers’ travel behavior. The proposed topic model can be used to infer the destination of unlinked trips, analyze travel patterns, and passenger clustering.

...read moreread less

Book Chapter•DOI•

Fake News Detection System Using XLNet Model with Topic Distributions: CONSTRAINT@AAAI2021 Shared Task

[...]

Akansha Gautam¹, V. Venktesh¹, Sarah Masud¹•Institutions (1)

Indraprastha Institute of Information Technology¹

08 Feb 2021

TL;DR: The team introduced an approach to combine topical distributions from Latent Dirichlet Allocation (LDA) with contextualized representations from XLNet, and compared the method with existing baselines to show that XLNet \(+\) Topic Distributions outperforms other approaches by attaining an F1-score of 0.967.

...read moreread less

Abstract: With the ease of access to information, and its rapid dissemination over the internet (both velocity and volume), it has become challenging to filter out truthful information from fake ones. The research community is now faced with the task of automatic detection of fake news, which carries real-world socio-political impact. One such research contribution came in the form of the Constraint@AAA12021 Shared Task on COVID19 Fake News Detection in English. In this paper, we shed light on a novel method we proposed as a part of this shared task. Our team introduced an approach to combine topical distributions from Latent Dirichlet Allocation (LDA) with contextualized representations from XLNet. We also compared our method with existing baselines to show that XLNet \(+\) Topic Distributions outperforms other approaches by attaining an F1-score of 0.967.

...read moreread less

Journal Article•DOI•

Visual topic models for healthcare data clustering

[...]

K. Rajendra Prasad, Moulana Mohammed¹, R. M. Noorullah¹•Institutions (1)

K L University¹

01 Jun 2021-Evolutionary Intelligence

TL;DR: This work presents hybrid topic modeling techniques by integrating traditional topic models with visualization procedures to aid in the visualization of topic clouds and health tendencies in the document collection and believes proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), and Visual Probabilistic Latent Schematic Indexing (VPLSI).

...read moreread less

Abstract: Social media is a great source to search health-related topics for envisages solutions towards healthcare. Topic models originated from Natural Language Processing that is receiving much attention in healthcare areas because of interpretability and its decision making, which motivated us to develop visual topic models. Topic models are used for the extraction of health topics for analyzing discriminative and coherent latent features of tweet documents in healthcare applications. Discovering the number of topics in topic models is an important issue. Sometimes, users enable an incorrect number of topics in traditional topic models, which leads to poor results in health data clustering. In such cases, proper visualizations are essential to extract information for identifying cluster trends. To aid in the visualization of topic clouds and health tendencies in the document collection, we present hybrid topic modeling techniques by integrating traditional topic models with visualization procedures. We believe proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), Visual intJNon-negative Matrix Factorization (VintJNMF), and Visual Probabilistic Latent Schematic Indexing (VPLSI) are promising methods for extracting tendency of health topics from various sources in healthcare data clustering. Standard and benchmark social health datasets are used in an experimental study to demonstrate the efficiency of proposed models concerning clustering accuracy (CA), Normalized Mutual Information (NMI), precision (P), recall (R), F-Score (F) measures and computational complexities. VNMF visual model performs significantly at an increased rate of 32.4% under cosine based metric in the display of visual clusters and an increased rate of 35–40% in performance measures compared to other visual methods on different number of health topics.

...read moreread less

Proceedings Article•DOI•

Method Of Text Summarization Using Lsa And Sentence Based Topic Modelling With Bert

[...]

Hritvik Gupta, Mayank Patel

25 Mar 2021

TL;DR: In this article, Latent Semantic Analysis (LSA) using truncated SVD is used to extract all the relevant topics from the text. But, this method is not suitable for the task of document summarization.

...read moreread less

Abstract: Document summarization is one such task of the natural language processing which deals with the long textual data to make its concise and fluent summaries that contains all of document relevant information. The Branch of NLP that deals with it, is automatic text summarizer. Automatic text summarizer does the task of converting the long textual document into short fluent summaries. There are generally two ways of summarizing text using automatic text summarizer, first is using extractive text summarizer and another abstractive text summarizer. This paper has demonstrated an experiment in contrast with the extractive text summarizer for summarizing the text. On the other hand topic modelling is a NLP task that extracts the relevant topic from the textual document. One such method is Latent semantic Analysis (LSA) using truncated SVD which extracts all the relevant topics from the text. This paper has demonstrated the experiment in which the proposed research work will be summarizing the long textual document using LSA topic modelling along with TFIDF keyword extractor for each sentence in a text document and also using BERT encoder model for encoding the sentences from textual document in order to retrieve the positional embedding of topics word vectors. The algorithm proposed algorithm in this paper is able to achieve the score greater than that of text summarization using Latent Dirichlet Allocation (LDA) topic modelling.

...read moreread less

Journal Article•DOI•

The main trends for multi-tier supply chain in Industry 4.0 based on Natural Language Processing

[...]

Rongyan Zhou¹, Anjali Awasthi², Julie Stal-Le Cardinal¹•Institutions (2)

Université Paris-Saclay¹, Concordia University²

01 Feb 2021-Computers in Industry

TL;DR: The combination of natural language processing in machine learning to classify research topics and traditional literature review to investigate article details greatly improved the objectivity and scientificity of the study and laid a solid foundation for further research.

...read moreread less

Journal Article•DOI•

Neural Attentive Network for Cross-Domain Aspect-Level Sentiment Classification

[...]

Min Yang¹, Wenpeng Yin², Qiang Qu¹, Wenting Tu³, Ying Shen⁴, Xiaojun Chen⁵ - Show less +2 more•Institutions (5)

Chinese Academy of Sciences¹, University of Pennsylvania², Shanghai University of Finance and Economics³, Peking University⁴, Shenzhen University⁵

01 Jul 2021-IEEE Transactions on Affective Computing

TL;DR: A novel approach NAACL (Neural Attentive model for cross-domain Aspect-level sentiment CLassification), which leverages the benefits of the supervised deep neural network as well as the unsupervised probabilistic generative model to strengthen the representation learning is proposed.

...read moreread less

Abstract: This work takes the lead to study the aspect-level sentiment classification in the domain adaptation scenario . Given a document of any domains, the model needs to figure out the sentiments with respect to fine-grained aspects in the documents. Two main challenges exist in this problem. One is to build a robust document modeling across domains; the other is to mine the domain-specific aspects and make use of the sentiment lexicon. In this paper, we propose a novel approach Neural Attentive model for cross-domain Aspect-level sentiment CLassification (NAACL), which leverages the benefits of the supervised deep neural network as well as the unsupervised probabilistic generative model to strengthen the representation learning. NAACL jointly learns two tasks: (i) a domain classifier, working on documents in both the source and target domains to recognize the domain information of input texts and transfer knowledge from the source domain to the target domain. In particular, a weakly supervised Latent Dirichlet Allocation model (wsLDA) is proposed to learn the domain-specific aspect and sentiment lexicon representations that are then used to calculate the aspect/lexicon-aware document representations via a multi-view attention mechanism; (ii) an aspect-level sentiment classifier, sharing the document modeling with the domain classifier. It makes use of the domain classification results and the aspect/sentiment-aware document representations to classify the aspect-level sentiment of the document in domain adaptation scenario. NAACL is evaluated on both English and Chinese datasets with the out-of-domain as well as in-domain setups. Quantitatively, the experiments demonstrate that NAACL has robust superiority over the compared methods in terms of classification accuracy and F1 score. The qualitative evaluation also shows that the proposed model is capable of reasonably paying attention to those words that are important to judge the sentiment polarity of the input text given an aspect.

...read moreread less

Journal Article•DOI•

A framework to simplify pre-processing location-based social media big data for sustainable urban planning and management

[...]

Mohammed Abdul-Rahman¹, Edwin H.W. Chan¹, Man Sing Wong¹, Victor E. Irekponor, Maryam O. Abdul-Rahman - Show less +1 more•Institutions (1)

Hong Kong Polytechnic University¹

01 Feb 2021-Cities

TL;DR: This study proposed a framework for social media big data mining and data analytics using Twitter and demonstrated the functionalities of the framework on a case study using Natural Language Processing and Machine Learning techniques to mine, clean, process, and validate the data.

...read moreread less

Journal Article•DOI•

Latent Dirichlet Allocation Model Training With Differential Privacy

[...]

Fangyuan Zhao¹, Xuebin Ren¹, Shusen Yang¹, Han Qing¹, Peng Zhao¹, Xinyu Yang¹ - Show less +2 more•Institutions (1)

Xi'an Jiaotong University¹

01 Jan 2021-IEEE Transactions on Information Forensics and Security

TL;DR: Wang et al. as discussed by the authors proposed a centralized privacy-preserving LDA (HDP-LDA) algorithm to prevent data inference from the intermediate statistics in the collapsed Gibbs sampling (CGS) training.

...read moreread less

Abstract: Latent Dirichlet Allocation (LDA) is a popular topic modeling technique for hidden semantic discovery of text data and serves as a fundamental tool for text analysis in various applications. However, the LDA model as well as the training process of LDA may expose the text information in the training data, thus bringing significant privacy concerns. To address the privacy issue in LDA, we systematically investigate the privacy protection of the main-stream LDA training algorithm based on Collapsed Gibbs Sampling (CGS) and propose several differentially private LDA algorithms for typical training scenarios. In particular, we present the first theoretical analysis on the inherent differential privacy guarantee of CGS based LDA training and further propose a centralized privacy-preserving algorithm (HDP-LDA) that can prevent data inference from the intermediate statistics in the CGS training. Also, we propose a locally private LDA training algorithm ( LP-LDA ) on crowdsourced data to provide local differential privacy for individual data contributors. Furthermore, we extend LP-LDA to an online version as OLP-LDA to achieve LDA training on locally private mini-batches in a streaming setting. Extensive analysis and experiment results validate both the effectiveness and efficiency of our proposed privacy-preserving LDA training algorithms.

...read moreread less

Journal Article•DOI•

LDA-Based Topic Modeling Sentiment Analysis Using Topic/Document/Sentence (TDS) Model

[...]

Akhmedov Farkhod, Akmalbek Abdusalomov, Fazliddin Makhmudov, Young Im Cho

23 Nov 2021-Applied Sciences

TL;DR: In this article, a topic document sentence (TDS) model is proposed based on joint sentiment topic (JST) and latent Dirichlet allocation (LDA) topic modeling techniques to discover sentiment polarity not only at the document level but also at the word level.

...read moreread less

Abstract: Customer reviews on the Internet reflect users’ sentiments about the product, service, and social events. As sentiments can be divided into positive, negative, and neutral forms, sentiment analysis processes identify the polarity of information in the source materials toward an entity. Most studies have focused on document-level sentiment classification. In this study, we apply an unsupervised machine learning approach to discover sentiment polarity not only at the document level but also at the word level. The proposed topic document sentence (TDS) model is based on joint sentiment topic (JST) and latent Dirichlet allocation (LDA) topic modeling techniques. The IMDB dataset, comprising user reviews, was used for data analysis. First, we applied the LDA model to discover topics from the reviews; then, the TDS model was implemented to identify the polarity of the sentiment from topic to document, and from document to word levels. The LDAvis tool was used for data visualization. The experimental results show that the analysis not only obtained good topic partitioning results, but also achieved high sentiment analysis accuracy in document- and word-level sentiment classifications.

...read moreread less

Journal Article•DOI•

Topic based Sentiment Analysis for COVID-19 Tweets

[...]

Manal Abdulaziz, Alanoud Alotaibi, Mashail Alsolamy, Abeer Alabbas

01 Jan 2021-International Journal of Advanced Computer Science and Applications

TL;DR: The research findings revealed the appearance of conflicting topics throughout the two Coronavirus pandemic periods and the expectations and interests of all individuals regarding the various topics were well represented.

...read moreread less

Abstract: The incessant Coronavirus pandemic has had a detrimental impact on nations across the globe The essence of this research is to demystify the social media's sentiments regarding Coronavirus The paper specifically focuses on twitter and extracts the most discussed topics during and after the first wave of the Coronavirus pandemic The extraction was based on a dataset of English tweets pertinent to COVID-19 The research study focuses on two main periods with the first period starting from March 01,2020 to April 30, 2020 and the second period starting from September 01,2020 to October 31, 2020 The Latent Dirichlet Allocation (LDA) was adopted for topics extraction whereas a lexicon based approach was adopted for sentiment analysis In regards to implementation, the paper utilized spark platform with Python to enhance speed and efficiency of analyzing and processing large-scale social data The research findings revealed the appearance of conflicting topics throughout the two Coronavirus pandemic periods Besides, the expectations and interests of all individuals regarding the various topics were well represented © 2021 All rights reserved

...read moreread less

Collapse