Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

An overview of latent Markov models for longitudinal categorical data

[...]

Francesco Bartolucci, Alessio Farcomeni, Fulvia Pennoni

14 Mar 2010-arXiv: Statistics Theory

TL;DR: A comprehensive overview of latent Markov (LM) models for the analysis of longitudinal categorical data and several constrained versions of the basic LM model, which make the model more parsimonious and allow us to include and test hypotheses of interest.

...read moreread less

Abstract: We provide a comprehensive overview of latent Markov (LM) models for the analysis of longitudinal categorical data. The main assumption behind these models is that the response variables are conditionally independent given a latent process which follows a first-order Markov chain. We first illustrate the basic LM model in which the conditional distribution of each response variable given the corresponding latent variable and the initial and transition probabilities of the latent process are unconstrained. For this model we also illustrate in detail maximum likelihood estimation through the Expectation-Maximization algorithm, which may be efficiently implemented by recursions known in the hidden Markov literature. We then illustrate several constrained versions of the basic LM model, which make the model more parsimonious and allow us to include and test hypotheses of interest. These constraints may be put on the conditional distribution of the response variables given the latent process (measurement model) or on the distribution of the latent process (latent model). We also deal with extensions of LM model for the inclusion of individual covariates and to multilevel data. Covariates may affect the measurement or the latent model; we discuss the implications of these two different approaches according to the context of application. Finally, we outline methods for obtaining standard errors for the parameter estimates, for selecting the number of states and for path prediction. Models and related inference are illustrated by the description of relevant socio-economic applications available in the literature.

...read moreread less

22 citations

Book Chapter•DOI•

Topic modeling for personalized recommendation of volatile items

[...]

Maks Ovsjanikov¹, Ye Chen²•Institutions (2)

Stanford University¹, Microsoft²

20 Sep 2010

TL;DR: This paper proposes an efficient topic modeling framework in the presence of volatile dyadic observations when direct topic modeling is infeasible and shows that the proposed learning method outperforms the traditional LDA by capturing more persistent relations between dyadic sets of wide and practical significance.

...read moreread less

Abstract: One of the major strengths of probabilistic topic modeling is the ability to reveal hidden relations via the analysis of co-occurrence patterns on dyadic observations, such as document-term pairs. However, in many practical settings, the extreme sparsity and volatility of co-occurrence patterns within the data, when the majority of terms appear in a single document, limits the applicability of topic models. In this paper, we propose an efficient topic modeling framework in the presence of volatile dyadic observations when direct topic modeling is infeasible. We show both theoretically and empirically that often-available unstructured and semantically-rich meta-data can serve as a link between dyadic sets, and can allow accurate and efficient inference. Our approach is general and can work with most latent variable models, which rely on stable dyadic data, such as pLSI, LDA, and GaP. Using transactional data from a major e-commerce site, we demonstrate the effectiveness as well as the applicability of our method in a personalized recommendation system for volatile items. Our experiments show that the proposed learning method outperforms the traditional LDA by capturing more persistent relations between dyadic sets of wide and practical significance.

...read moreread less

22 citations

Journal Article•DOI•

Cognitively inspired nlp-based knowledge representations: further explorations of latent semantic analysis

[...]

Max M. Louwerse, Zhiqiang Cai, Xiangen Hu, Matthew Ventura, Patrick Jeuniaux - Show less +1 more

01 Dec 2006-International Journal on Artificial Intelligence Tools

TL;DR: These explorations focus on the idea that the power of LSA can be amplified by considering semantic fields of text units instead of pairs ofText units, showing new evidence for LSA as a mechanism for knowledge representation.

...read moreread less

Abstract: Natural-language based knowledge representations borrow their expressiveness from the semantics of language. One such knowledge representation technique is Latent semantic analysis (LSA), a statistical, corpus-based method for representing knowledge. It has been successfully used in a variety of applications including intelligent tutoring systems, essay grading and coherence metrics. The advantage of LSA is that it is efficient in representing world knowledge without the need for manual coding of relations and that it has in fact been considered to simulate aspects of human knowledge representation. An overview of LSA applications will be given, followed by some further explorations of the use of LSA. These explorations focus on the idea that the power of LSA can be amplified by considering semantic fields of text units instead of pairs of text units. Examples are given for semantic networks, category membership, typicality, spatiality and temporality, showing new evidence for LSA as a mechanism for knowledge representation. The results of such tests show that while the mechanism behind LSA is unique, it is flexible enough to replicate results in different corpora and languages.

...read moreread less

22 citations

Book Chapter•DOI•

Comparing LDA with pLSI as a dimensionality reduction method in document clustering

[...]

Tomonari Masada¹, Senya Kiyasu¹, Sueharu Miyahara¹•Institutions (1)

Nagasaki University¹

03 Mar 2008

TL;DR: In this paper, the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors without degrading the vector dimension.

...read moreread less

Abstract: In this paper, we compare latent Dirichlet allocation (LDA) with probabilistic latent semantic indexing (pLSI) as a dimensionality reduction method and investigate their effectiveness in document clustering by using real-world document sets. For clustering of documents, we use a method based on multinomial mixture, which is known as an efficient framework for text mining. Clustering results are evaluated by F-measure, i.e., harmonic mean of precision and recall. We use Japanese and Korean Web articles for evaluation and regard the category assigned to each Web article as the ground truth for the evaluation of clustering results. Our experiment shows that the dimensionality reduction via LDA and pLSI results in document clusters of almost the same quality as those obtained by using original feature vectors. Therefore, we can reduce the vector dimension without degrading cluster quality. Further, both LDA and pLSI are more effective than random projection, the baseline method in our experiment. However, our experiment provides no meaningful difference between LDA and pLSI. This result suggests that LDA does not replace pLSI at least for dimensionality reduction in document clustering.

...read moreread less

22 citations

Identification of Critical Values in Latent Semantic Indexing.

[...]

April Kontostathis¹, William M. Pottenger², Brian D. Davison²•Institutions (2)

Ursinus College¹, Lehigh University²

01 Jan 2005

TL;DR: This chapter analyzes the values used by Latent Sematic Indexing (LSI) for information retrieval by manipulating the values in the Singular Value Decomposition (SVD) matrices, and finds that a significant fraction of the values have little effect on overall performance, and can be removed.

...read moreread less

Abstract: In this chapter we analyze the values used by Latent Sematic Indexing (LSI) for information retrieval By manipulating the values in the Singular Value Decomposition (SVD) matrices, we find that a significant fraction of the values have little effect on overall performance, and can thus be removed (changed to zero) This allows us to convert the dense term by dimension and document by dimension matrices into sparse matrices by identifying and removing those entries We empirically show that these entries are unimportant by presenting retrieval and runtime performance results, using seven collections, which show that removal of up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI) Removal of 90% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested Our approach additionally has the computational benefit of reducing memory requirements and query response time

...read moreread less

22 citations

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics