Showing papers on "Probabilistic latent semantic analysis published in 2021"

PDF

Open Access

Journal Article•DOI•

Sparse User Check-in Venue Prediction By Exploring Latent Decision Contexts From Location-Based Social Networks

[...]

Daniel Zhang¹, Yang Zhang¹, Qi Li¹, Dong Wang¹•Institutions (1)

01 Nov 2021-IEEE Transactions on Big Data

TL;DR: A Context-aware Sparse Check-in Venue Prediction (CSCVP) scheme inspired by natural language processing techniques to address the above challenges and predicts the venue category information and explores the similarity between users to address data sparsity challenge by significantly reducing the prediction space.

...read moreread less

Abstract: The proliferation of online Location-Based Social Networks (LBSN) has offered unprecedented opportunities for understanding fine-grained spatio-temporal behaviors of users and developing new location-aware applications. In this article, we focus on the problem of “Sparse User Check-in Venue Prediction,” where the goal is to predict the next venue LBSN users will visit by exploiting their sparse online check-in traces and the latent decision contexts. While efforts have been made to predict users’ check-in traces on a LBSN, several important challenges still exist. First, check-in traces contributed by LBSN users are often too sparse to provide sufficient evidence for a reliable prediction, especially when the prediction space is huge (e.g., hundreds of thousands of venues in large cities). Second, the user's decision context on which venue to visit next is often latent and has not been incorporated by current venue prediction models. Third, the dynamic and non-deterministic dependency between check-ins is either ignored or replaced by a simplified “consecutiveness” assumption in existing solutions, leading to sub-optimal prediction results. In this article, we develop a Context-aware Sparse Check-in Venue Prediction (CSCVP) scheme inspired by natural language processing techniques to address the above challenges. In particular, CSCVP predicts the venue category information and explores the similarity between users to address data sparsity challenge by significantly reducing the prediction space. It also leverages the Probabilistic Latent Semantic Analysis (PLSA) model to incorporate the user decision context into the prediction model. Finally, we develop a novel Temporal Adaptive Ngram (TA-Ngram) model in CSCVP to capture the dynamic and non-deterministic dependency between check-ins. We evaluate CSCVP using three real-world LBSN datasets. The results show that our scheme significantly improves accuracy (30.9 percent improvement) of the state-of-the-art user check-in venue prediction solutions.

...read moreread less

10 citations

Journal Article•DOI•

Analyzing LDA and NMF Topic Models for Urdu Tweets via Automatic Labeling

[...]

Zoya¹, Seemab Latif¹, Faisal Shafait¹, Rabia Latif²•Institutions (2)

University of the Sciences¹, Prince Sultan University²

14 Sep 2021-IEEE Access

TL;DR: In this paper, the authors presented experiments with multiple approaches of topic modeling like Latent Semantic Analysis (LSA), Probabilistic LSA (PLSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF) on 0.8 million Urdu tweets.

...read moreread less

Abstract: The understanding and analyzing of available content on Social media Platforms such as Twitter and Facebook, through various topic modeling methods is not supervised. However, despite several existing conventional techniques, they have had limited success when applied directly for filtering and quick comprehension of short-text contents due to text sparseness and noise. Thus, it always has been challenging to discover reliable latent topics from online discussion texts that prevail with low words co-occurrence and availability of large size social media benchmark datasets, even for resource-rich languages. The existing literature lacks such work for Urdu text to unveil niche topics even with conventional topic models, mainly due to the lack of benchmark datasets, limited availability of pre-processing tools/ algorithms, and time and compute limitations on large-sized datasets. This work presents experiments with multiple approaches of topic modeling like Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF) on 0.8 million Urdu tweets. These tweets are collected through Twitter API by giving various hashtags as a query to avoid dominance of single topic in the dataset. In addition, we have pre-processed the text of the tweets, prepared the three variants of the collected dataset, and extracted multiple features to represent documents on different n-grams. Furthermore, all these techniques are compared and evaluated on the dataset variants, using both qualitative and quantitative measures. We have also demonstrated the results of these approaches through visualization methods, graphs depicting tweets size per topic, word clouds, and hashtags analysis, giving insights about algorithms performances on finalized topics. Observed results reveal that NMF outperformed the techniques with TF-IDF feature vectors in Urdu tweets text, while LDA performed best with merging short-text strategy into long pseudo documents.

...read moreread less

9 citations

Journal Article•DOI•

Women’s e-commerce clothing sentiment analysis by probabilistic model LDA using R-SPARK

[...]

Nikhat Parveen¹, M.V.B.T. Santhi¹, Lakshmi Ramani Burra, Vidyullatha Pellakuri¹, Haran Pellakuri¹ - Show less +1 more•Institutions (1)

K L University¹

06 Jan 2021-Materials Today: Proceedings

TL;DR: Online feedback for sentiment analysis is defined to encourage a more custom-made shopping experience, resulting in higher retention rates and viable improvement, and it is necessary to forecast the scale of e-commerce transactions.

...read moreread less

9 citations

Journal Article•DOI•

A mathematical comparison of non-negative matrix factorization related methods with practical implications for the analysis of mass spectrometry imaging data.

[...]

Melanie Nijs¹, Tina Smets¹, Etienne Waelkens¹, Bart De Moor¹•Institutions (1)

Katholieke Universiteit Leuven¹

15 Nov 2021-Rapid Communications in Mass Spectrometry

TL;DR: In this paper, a mathematical comparison between NMF, PLSA, and LDA for the analysis of mass spectrometry imaging (MSI) data is presented, which includes a detailed evaluation of Kullback-Leibler NMF (KL-NMF) for the first time.

...read moreread less

Abstract: RATIONALE Non-negative matrix factorization (NMF) has been used extensively for the analysis of mass spectrometry imaging (MSI) data, visualizing simultaneously the spatial and spectral distributions present in a slice of tissue. The statistical framework offers two related NMF methods: probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA), which is a generative model. This work offers a mathematical comparison between NMF, PLSA, and LDA, and includes a detailed evaluation of Kullback-Leibler NMF (KL-NMF) for MSI for the first time. We will inspect the results for MSI data analysis as these different mathematical approaches impose different characteristics on the data and the resulting decomposition. METHODS The four methods (NMF, KL-NMF, PLSA, and LDA) are compared on seven different samples: three originated from mice pancreas and four from human-lymph-node tissues, all obtained using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS). RESULTS Where matrix factorization methods are often used for the analysis of MSI data, we find that each method has different implications on the exactness and interpretability of the results. We have discovered promising results using KL-NMF, which has only rarely been used for MSI so far, improving both NMF and PLSA, and have shown that the hitherto stated equivalent KL-NMF and PLSA algorithms do differ in the case of MSI data analysis. LDA, assumed to be the better method in the field of text mining, is shown to be outperformed by PLSA in the setting of MALDI-MSI. Additionally, the molecular results of the human-lymph-node data have been thoroughly analyzed for better assessment of the methods under investigation. CONCLUSIONS We present an in-depth comparison of multiple NMF-related factorization methods for MSI. We aim to provide fellow researchers in the field of MSI a clear understanding of the mathematical implications using each of these analytical techniques, which might affect the exactness and interpretation of the results.

...read moreread less

6 citations

Journal Article•DOI•

Improving sentiment analysis with learning concepts from concept, patterns lexicons and negations

[...]

Anima Pradhan¹, Manas Ranjan Senapati¹, Pradip Kumar Sahu¹•Institutions (1)

Veer Surendra Sai University of Technology¹

24 Aug 2021-Ain Shams Engineering Journal

TL;DR: This work proposes the application of two probabilistic graphical Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (PLSA) algorithms to generate latent topic terms as possible aspects to improve the performance of the machine learning classification algorithm in ABSA.

...read moreread less

6 citations

DOI•

Text Classification and Topic Modelling of Web Extracted Data

[...]

Niraj Kumar¹, Rajiv Ranjan Suman¹, Sanjay Kumar¹•Institutions (1)

National Institute of Technology, Jamshedpur¹

01 Oct 2021

TL;DR: In this paper, the authors applied Parsing techniques on various websites to extract the HTML and XML data which includes the textual data and also applied Preprocessing techniques to clean the data.

...read moreread less

Abstract: Text classification and Topic Modelling is the backbone for the text analysis of huge amount of corpus of data. With an increase in unstructured data around us, it is very difficult to analyse the data very easily. There is a need for some methods that can be applied to the data to get the sensitive and semantic information from the corpus. Text classification is categorization of text in organised way for the interpretation of sensitive information from the text, while Topic modelling is finding the abstract topic for the collection of text or document. Topic modelling is used frequently to find semantic information from the textual data. In this paper we applied Parsing techniques on various websites to extract the HTML and XML data which includes the textual data and also applied Preprocessing techniques to clean the data. For the text classification purpose some of the Machine learning based classifiers that we have used in our experiment are Naive Bayes and also Logistic Regression Classifier. The models of the document are built using three different topic modelling methods which are Latent Semantic Analysis, Probabilistic Latent Semantic Analysis and Latent Dirichlet Allocation. In the further experiment we have done analysis and also comparison based upon the performance of the models and classifiers on the processed textual data.

...read moreread less

3 citations

Journal Article•DOI•

Latent class analysis for models with error of measurement using log-linear models and an application to women's liberation data

[...]

Haydar Demirhan

23 Mar 2021-Journal of data science

TL;DR: If the latent variable is ordinal and manifest variables are nominal, an approach to handle the restrictions is given for latent class analysis of the models with error of measurement using log linear models, which reduces overall uncertainty, and inferences become more precise.

...read moreread less

Abstract: This article deals with the latent class analysis of models with error of measurement. If the latent variable is ordinal and manifest variables are nominal, an approach to handle the restrictions is given for latent class analysis of the models with error of measurement using log linear models. By this way, we include ordinal nature of the latent variable into the analysis. Therefore, overall uncertainty is decreased, and our inferences become more precise. The new approach is applied to a women's liberation data set. Latent class analysis is frequently used in social sciences and education. Main aim of the analysis is to explain the association structure between manifest vari- ables by using unobserved variables, namely latent variables. Latent class analy- sis is a categorical analogous of the factor analysis when the latent and manifest variables are categorical. Log-linear models are widely used for the analysis of contingency tables. It is possible to represent a latent class model as a log-linear model using conditional response probabilities. This representation is called as log-linear parametrization. This is a special case of Formann's linear logstic latent class analysis (Formann, 1992). Error of measurement models are probabilistic versions of Guttman scale (Guttman, 1950), and considered as restricted latent class models. In the log- linear parametrization of the latent class models, types of manifest and latent variables are important issues because latent class models specialize according to the typology of the variables. For example, if the latent variable is metrical and manifest variables are nominal then appropriate analysis is to carry on latent class analysis with linear restrictions or use nominal response models (Heinen, 1996). In our concerned models with error of measurement, manifest variables

...read moreread less

3 citations

Journal Article•DOI•

Research on sentiment classification of micro-blog short text based on topic clustering

[...]

Shifen Tu¹, Bo Yang¹•Institutions (1)

Jiangxi University of Finance and Economics¹

01 Mar 2021

TL;DR: The experimental results show that the sentiment analysis method integrated with PLSA and K-means can obtain higher classification accuracy than the PLSA model method alone.

...read moreread less

Abstract: Aiming at the shortage of research on micro-blog short text fine-grained sentiment classification, a fine-grained sentiment classification method about micro-blog short text based on PLSA model and K-means clustering model was proposed. PLSA is used to calculate the probability matrix between documents and topics, words and topics in the corpus. In terms of the probability distribution of words and topics, K-means algorithm is used to cluster the probability distribution of words on topics and merge the similar topics. Based on the sentiment ontology library, emotion recognition is carried out for the merged topics. Then, according to the merged document and topic probability matrix, the document sentiment category is classified. The experimental results show that the sentiment analysis method integrated with PLSA and K-means can obtain higher classification accuracy than the PLSA model method alone.

...read moreread less

3 citations

Proceedings Article•DOI•

An Ensemble of Deep Semantic Representation for Medical X-ray Image Classification

[...]

Mohammad Reza Zare¹, Mehdi Mehtarizadeh¹•Institutions (1)

University of Leicester¹

24 Mar 2021

TL;DR: In this article, the convolutional neural networks (CNNs) are employed together with probabilistic latent semantic analysis (PLSA) which are capable of mining hidden semantics of images and then fed into a discriminative support vector machine (SVM) to build a classification model.

...read moreread less

Abstract: An efficient medical image classification system has gained high interest in the scientific community. This paper presents a classification algorithm that aims to gain a high accuracy rate by addressing some of the typical challenges involved in classification of large medical datasets. In this paper, the convolutional neural networks (CNNs) are employed together with probabilistic latent semantic analysis (PLSA) which are capable of mining hidden semantics of images. This high-level semantic representation of the images is then fed into a discriminative support vector machine (SVM) to build a classification model. An ensemble of machine learning models is also employed to utilize the capability of classification models created from different sets of data. The evaluation is based on a medical image dataset consisting of 11,000 X-ray images from 116 distinct categories. The classification accuracy rate obtained by the proposed classification model is 94.5 %. The results show that the proposed classification model outperformed the methods in the literature evaluated on the same benchmark dataset.

...read moreread less

1 citations

Posted Content•

Bitcoin's Crypto Flow Network

[...]

Yoshi Fujiwara, Rubaiyat Islam

21 Jun 2021-arXiv: General Finance

TL;DR: Wang et al. as mentioned in this paper applied the methods of bow-tie structure and Hodge decomposition to locate the users in the upstream, downstream, and core of the entire crypto flow.

...read moreread less

Abstract: How crypto flows among Bitcoin users is an important question for understanding the structure and dynamics of the cryptoasset at a global scale. We compiled all the blockchain data of Bitcoin from its genesis to the year 2020, identified users from anonymous addresses of wallets, and constructed monthly snapshots of networks by focusing on regular users as big players. We apply the methods of bow-tie structure and Hodge decomposition in order to locate the users in the upstream, downstream, and core of the entire crypto flow. Additionally, we reveal principal components hidden in the flow by using non-negative matrix factorization, which we interpret as a probabilistic model. We show that the model is equivalent to a probabilistic latent semantic analysis in natural language processing, enabling us to estimate the number of such hidden components. Moreover, we find that the bow-tie structure and the principal components are quite stable among those big players. This study can be a solid basis on which one can further investigate the temporal change of crypto flow, entry and exit of big players, and so forth.

...read moreread less

1 citations

Posted Content•

Topic Model Supervised by Understanding Map

[...]

Gangli Liu¹•Institutions (1)

Tsinghua University¹

12 Oct 2021-arXiv: Computation and Language

TL;DR: In this article, an extension called Semantic Center of Mass (SCOM) is proposed, and used to discover the abstract "topic" of a document, under a framework model called Understanding Map Supervised Topic Model (UM-S-TM).

...read moreread less

Abstract: Inspired by the notion of Center of Mass in physics, an extension called Semantic Center of Mass (SCOM) is proposed, and used to discover the abstract "topic" of a document. The notion is under a framework model called Understanding Map Supervised Topic Model (UM-S-TM). The devise aim of UM-S-TM is to let both the document content and a semantic network -- specifically, Understanding Map -- play a role, in interpreting the meaning of a document. Based on different justifications, three possible methods are devised to discover the SCOM of a document. Some experiments on artificial documents and Understanding Maps are conducted to test their outcomes. In addition, its ability of vectorization of documents and capturing sequential information are tested. We also compared UM-S-TM with probabilistic topic models like Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Analysis (pLSA).

...read moreread less

Journal Article•DOI•

Demand Analysis of Online Chinese Behavior Expression in Wireless Sensor Network

[...]

Zheng Liu¹, Zheng Liu²•Institutions (2)

Fudan University¹, Zhejiang Gongshang University²

27 Sep 2021-Wireless Communications and Mobile Computing

TL;DR: Big data analysis methods and semantic model analysis methods are adopted and constructs semantic analysis models through PLSA method calculations and shows that the accuracy and applicability of the semantic analysis model is increased and the accuracy of the data set is improved.

...read moreread less

Abstract: Due to the common progress and interdependence of wireless sensor networks and language, Chinese semantic analysis under wireless sensor networks has become more and more important. Although there are many research results on wireless networks and Chinese semantics, there are few researches on the influence and relationship between them. Wireless sensor networks have strong application relevance, and the key technologies that need to be solved are also different for different application backgrounds. In order to reveal the basic laws and development trends of online Chinese semantic behavior expression in the context of wireless sensor networks, this paper adopts big data analysis methods and semantic model analysis methods and constructs semantic analysis models through PLSA method calculations, so that the construction process conforms to this research topic. Research the accuracy and applicability of the semantic analysis model. Through word extraction of 1.05 million word data of 1,103 documents on Baidu Tieba, HowNet, and citeulike websites, the data set was integrated into a data set, and the PLSA model was verified with this data set. In addition, through the construction of the wireless sensor network, the semantic analysis results in the expression of Chinese behavior are obtained. The results show that the accuracy of the data set extracted from 1103 documents increases with the increase of the number of documents. Second, after using the PLSA model to perform semantic analysis on the data set, the accuracy of the data set is improved. Compared with traditional semantic analysis, the model and the big data analysis framework have obvious advantages. With the continuous development of Internet big data, the big data methods used to count Chinese semantics are also constantly updated, and their efficiency is constantly improving. These updated semantic analysis models and statistical methods are constantly eliminating the uncertainty of modern online Chinese. The basic laws and development trends of statistical Chinese semantics also provide new application scenarios for online Chinese behavior. It also laid a ladder for subsequent scholars.

...read moreread less

Other•DOI•

A Hybrid Approach for Feature Extraction From Reviews to Perform Sentiment Analysis

[...]

Alok Kumar¹, Renu Jain¹•Institutions (1)

University Institute of Engineering and Technology, Panjab University¹

13 Aug 2021

Proceedings Article•DOI•

Study on Chinese text classification for FastText that combing TF-RF and improved random walk model

[...]

Zheng Wang

09 Apr 2021

TL;DR: In this paper, a Chinese FastText text classification method combing Term Frequency-Relevance Frequency (TF-RF) and improved random walk model is suggested in the paper, the method makes TF-R weight choice to N-gram processed dictionaries during the input stage of the FastText model, making semantic analysis by using Probabilistic Latent Semantic Analysis (PLSA), and supplements to feature words; then utilizes the improved Random Walk model to improve the accuracy, and the improved model is more suitable for Chinese text classification.

...read moreread less

Abstract: FastText is a text classification model by Facebook. As the model is simple in structure, it has the advantage of fast and efficient. However, when the model is used in Chinese text classification, the accurate rate will decrease. To this end, a Chinese FastText text classification method combing Term Frequency-Relevance Frequency (TF-RF) and improved random walk model is suggested in the paper. The method makes TF-R weight choice to N-gram processed dictionaries during the input stage of the FastText model, making semantic analysis by using Probabilistic Latent Semantic Analysis (PLSA), and supplements to feature words; then utilizes the improved random walk model to improve the accuracy, and the improved model is more suitable for Chinese text classification. The experiment result shows that improved model in the paper has a better effect to Chinese text classification.

...read moreread less