scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings ArticleDOI
15 Aug 2005
TL;DR: This paper introduces the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs and incorporates the human-annotated category information.
Abstract: Latent semantic indexing (LSI) is a well-known unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values in the training data set. This is of particular importance in applications with multiple labels, in which each document can belong to several categories simultaneously. In this paper we introduce the multi-label informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs. The recovered "latent semantics" thus incorporate the human-annotated category information and can be used to greatly improve the prediction accuracy. Empirical study based on two data sets, Reuters-21578 and RCV1, demonstrates very encouraging results.

253 citations

Journal ArticleDOI
TL;DR: This paper summarizes three experiments that illustrate how LSA may be used in text-based research by describing methods for analyzing a subject’s essay for determining from what text a subject learned the information and for grading the quality of information cited in the essay.
Abstract: Latent semantic analysis (LSA) is a statistical model of word usage that permits comparisons of semantic similarity between pieces of textual information. This paper summarizes three experiments that illustrate how LSA may be used in text-based research. Two experiments describe methods for analyzing a subject’s essay for determining from what text a subject learned the information and for grading the quality of information cited in the essay. The third experiment describes using LSA to measure the coherence and comprehensibility of texts.

253 citations

Journal ArticleDOI
TL;DR: A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented and in addition the scoring of individuals on the latent dimensions is discussed.
Abstract: In this paper we discuss a general model framework within which manifest variables with different distributions in the exponential family can be analyzed with a latent trait model. A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented. We discuss in addition the scoring of individuals on the latent dimensions. The general framework presented allows, not only the analysis of manifest variables all of one type but also the simultaneous analysis of a collection of variables with different distributions. The approach used analyzes the data as they are by making assumptions about the distribution of the manifest variables directly.

246 citations

Journal ArticleDOI
TL;DR: A Dirichlet-derived multiple topic model (DMTM) is proposed to fuse heterogeneous features at a topic level for HSR imagery scene classification and is able to reduce the dimension of the features representing the HSR images, to fuse the different types of features efficiently, and to improve the performance of the scene classification over that of other scene classification algorithms based on spatial pyramid matching, probabilistic latent semantic analysis, and latentDirichlet allocation.
Abstract: Due to the complex arrangements of the ground objects in high spatial resolution (HSR) imagery scenes, HSR imagery scene classification is a challenging task, which is aimed at bridging the semantic gap between the low-level features and the high-level semantic concepts. A combination of multiple complementary features for HSR imagery scene classification is considered a potential way to improve the performance. However, the different types of features have different characteristics, and how to fuse the different types of features is a classic problem. In this paper, a Dirichlet-derived multiple topic model (DMTM) is proposed to fuse heterogeneous features at a topic level for HSR imagery scene classification. An efficient algorithm based on a variational expectation–maximization framework is developed to infer the DMTM and estimate the parameters of the DMTM. The proposed DMTM scene classification method is able to incorporate different types of features with different characteristics, no matter whether these features are local or global, discrete or continuous. Meanwhile, the proposed DMTM can also reduce the dimension of the features representing the HSR images. In our experiments, three types of heterogeneous features, i.e., the local spectral feature, the local structural feature, and the global textural feature, were employed. The experimental results with three different HSR imagery data sets show that the three types of features are complementary. In addition, the proposed DMTM is able to reduce the dimension of the features representing the HSR images, to fuse the different types of features efficiently, and to improve the performance of the scene classification over that of other scene classification algorithms based on spatial pyramid matching, probabilistic latent semantic analysis, and latent Dirichlet allocation.

245 citations

Journal ArticleDOI
TL;DR: Different models, such as topic over time (TOT), dynamic topic models (DTM), multiscale topic tomography, dynamic topic correlation detection, detecting topic evolution in scientific literature, etc. are discussed.
Abstract: Topic models provide a convenient way to analyze large of unclassified text. A topic contains a cluster of words that frequently occur together. A topic modeling can connect words with similar meanings and distinguish between uses of words with multiple meanings. This paper provides two categories that can be under the field of topic modeling. First one discusses the area of methods of topic modeling, which has four methods that can be considerable under this category. These methods are Latent semantic analysis (LSA), Probabilistic latent semantic analysis (PLSA), Latent Dirichlet allocation (LDA), and Correlated topic model (CTM). The second category is called topic evolution models, which model topics by considering an important factor time. In the second category, different models are discussed, such as topic over time (TOT), dynamic topic models (DTM), multiscale topic tomography, dynamic topic correlation detection, detecting topic evolution in scientific literature, etc.

243 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858