scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Book ChapterDOI
23 Aug 2006
TL;DR: LDA is a “bag-of-words” type of language modeling and dimension reduction method reported to outperform other related methods, Latent Semantic Analysis (LSA) and Probabilistic LatentSemantic analysis (PLSA) in Information Retrieval (IR) domain.
Abstract: We report experiments on automatic essay grading using Latent Dirichlet Allocation (LDA). LDA is a “bag-of-words” type of language modeling and dimension reduction method, reported to outperform other related methods, Latent Semantic Analysis (LSA) and Probabilistic Latent Semantic Analysis (PLSA) in Information Retrieval (IR) domain. We introduce LDA in detail and compare its strengths and weaknesses to LSA and PLSA. We also compare empirically the performance of LDA to LSA and PLSA. The experiments were run with three essay sets consisting in total of 283 essays from different domains. On contrary to the findings in IR, LDA achieved slightly worse results compared to LSA and PLSA in the experiments. We state the reasons for LSA and PLSA outperforming LDA and indicate further research directions.

36 citations

Journal ArticleDOI
TL;DR: The main objective of this research paper is to design a system which would generate multimodal, nonparametric Bayesian model, and multilayered probability latent semantic analysis (pLSA)-based visual dictionary (BM-MpLSA).
Abstract: The main objective of this research paper is to design a system which would generate multimodal, nonparametric Bayesian model, and multilayered probability latent semantic analysis (pLSA)-based visual dictionary (BM-MpLSA). Advancement in technology and the exuberance of sports lovers have necessitated a requirement for automatic action recognition in the live video seed of sports. The fundamental requirement for such model is the creation of visual dictionary for each sports domain. This multimodal nonparametric model has two novel co-occurrence matrix creation—one for image feature vector and the other for textual entities. This matrix provides a basic scaling parameter for the unobserved random variables, and it is an extension of multilayered pLSA-based visual dictionary creation. This paper precisely concentrates on the creation of visual dictionary for Basketball. From the sports event images, the feature vector extracted is modified as SIFT and MPEG 7’s-based dominant color, color layout, scalable color and edge histograms. After quantization and analysis of these vector values, the visual vocabulary would be created by integrating them into the domain specific visual ontology for semantic understanding. The accuracy rate of this work is compared with respect to the action held on image based on performance.

36 citations

Journal ArticleDOI
TL;DR: The authors discusses latent class models as an approach to categorical data analysis when some variables have missing data, and discusses the estimation of loglinear models using a latent class approach, and shows how this framework applies to various missing data models and indicates how Habermans DNEWTON program can be used to estimate missing data.
Abstract: This article discusses latent class models as an approach to categorical data analysis when some variables have missing data. In contrast to standard latent class models in which each variable is either latent or observed for all sample observations our models include variables that are latent (missing) for some observations and manifest (not missing) for others. Particular attention is devoted to models in which the probability that an observation is missing on a variable depends on the level of that variable itself; in other words it focuses on models for "noningnorable nonresponse." It discusses the estimation of loglinear models using a latent class approach. It shows how this framework applies to various missing data models and indicates how Habermans DNEWTON program can be used to estimate missing data models.

36 citations

Journal ArticleDOI
TL;DR: A new approach is proposed, Discriminative Topic Model (DTM), which separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving local consistency.
Abstract: Topic modeling has become a popular method used for data analysis in various domains including text documents. Previous topic model approaches, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These approaches, however do not take into account the manifold structure of the data, which is generally informative for nonlinear dimensionality reduction mapping. More recent topic model approaches, Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown resulting benefits. But they fall short of achieving full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this article, we propose a new approach, Discriminative Topic Model (DTM), which separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving local consistency. We also present a novel model-fitting algorithm based on the generalized EM algorithm and the concept of Pareto improvement. We empirically demonstrate the success of DTM in terms of unsupervised clustering and semisupervised classification accuracies on text corpora and robustness to parameters compared to state-of-the-art techniques.

36 citations

Journal ArticleDOI
TL;DR: Experimental results show that the performance of the proposed semantic learning scheme is excellent when compared with that of the traditional text-based semantic retrieval techniques and content-based image retrieval methods.
Abstract: In this paper, a new semantic learning method for content-based image retrieval using the analytic hierarchical process (AHP) is proposed. AHP proposed by Satty used a systematical way to solve multi-criteria preference problems involving qualitative data and was widely applied to a great diversity of areas. In general, the interpretations of an image are multiple and hard to describe in terms of low-level features due to the lack of a complete image understanding model. The AHP provides a good way to evaluate the fitness of a semantic description used to interpret an image. According to a predefined concept hierarchy, a semantic vector, consisting of the fitness values of semantic descriptions of a given image, is used to represent the semantic content of the image. Based on the semantic vectors, the database images are clustered. For each semantic cluster, the weightings of the low-level features (i.e. color, shape, and texture) used to represent the content of the images are calculated by analyzing the homogeneity of the class. In this paper, the values of weightings setting to the three low-level feature types are diverse in different semantic clusters for retrieval. The proposed semantic learning scheme provides a way to bridge the gap between the high-level semantic concept and the low-level features for content-based image retrieval. Experimental results show that the performance of the proposed method is excellent when compared with that of the traditional text-based semantic retrieval techniques and content-based image retrieval methods.

36 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858