Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Text Summarization of Turkish Texts using Latent Semantic Analysis

[...]

Makbule Gulcin Ozsoy, Ilyas Cicekli¹, Ferda Nur Alpaslan•Institutions (1)

Bilkent University¹

23 Aug 2010

TL;DR: Two new LSA based summarization algorithms are proposed and their performances are compared using their ROUGE-L scores to find out well-formed summaries.

...read moreread less

Abstract: Text summarization solves the problem of extracting important information from huge amount of text data. There are various methods in the literature that aim to find out well-formed summaries. One of the most commonly used methods is the Latent Semantic Analysis (LSA). In this paper, different LSA based summarization algorithms are explained and two new LSA based summarization algorithms are proposed. The algorithms are evaluated on Turkish documents, and their performances are compared using their ROUGE-L scores. One of our algorithms produces the best scores.

...read moreread less

63 citations

Journal Article•

Latent Semantic Analysis for Text-Based Research

[...]

Sui Dan-ni¹•Institutions (1)

Dalian University of Technology¹

01 Jan 2005-Journal of Chongqing University

TL;DR: LSA is proved to be an efficient method in text-based research from three aspects: Matching summaries to the texts read, essays grading and the measurement of textual coherence, among which the last one plays the key role.

...read moreread less

Abstract: Latent Semantic Analysis(LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text.It is proved to be an efficient method in text-based research from three aspects: Matching summaries to the texts read,essays grading and the measurement of textual coherence,among which the last one plays the key role.LSA can measure the amount of semantic overlap between adjoining sections of text to calculate coherence.

...read moreread less

63 citations

Proceedings Article•DOI•

Estimating the Optimal Number of Latent Concepts in Source Code Analysis

[...]

Scott Grant¹, James R. Cordy¹•Institutions (1)

Queen's University¹

12 Sep 2010

TL;DR: A series of Latent Dirichlet Allocation models with varying topic counts are generated to evaluate the ability of the model to identify related source code blocks, and demonstrate the consequences of choosing too few or too many latent topics.

...read moreread less

Abstract: The optimal number of latent topics required to model the most accurate latent substructure for a source code corpus is an open question in source code analysis. Most estimates about the number of latent topics that exist in a software corpus are based on the assumption that the data is similar to natural language, but there is little empirical evidence to support this. In order to help determine the appropriate number of topics needed to accurately represent the source code, we generate a series of Latent Dirichlet Allocation models with varying topic counts. We use a heuristic to evaluate the ability of the model to identify related source code blocks, and demonstrate the consequences of choosing too few or too many latent topics.

...read moreread less

63 citations

Journal Article•DOI•

Bayesian Estimation of Discrete Multivariate Latent Structure Models With Structural Zeros

[...]

Daniel Manrique-Vallier¹, Jerome P. Reiter²•Institutions (2)

Indiana University¹, Duke University²

20 Oct 2014-Journal of Computational and Graphical Statistics

TL;DR: An approach for estimating posterior distributions in Bayesian latent structure models with potentially many structural zeros is presented, and an algorithm for collapsing a large set of structural zero combinations into a much smaller set of disjoint marginal conditions, which speeds up computation.

...read moreread less

Abstract: In multivariate categorical data, models based on conditional independence assumptions, such as latent class models, offer efficient estimation of complex dependencies. However, Bayesian versions of latent structure models for categorical data typically do not appropriately handle impossible combinations of variables, also known as structural zeros. Allowing nonzero probability for impossible combinations results in inaccurate estimates of joint and conditional probabilities, even for feasible combinations. We present an approach for estimating posterior distributions in Bayesian latent structure models with potentially many structural zeros. The basic idea is to treat the observed data as a truncated sample from an augmented dataset, thereby allowing us to exploit the conditional independence assumptions for computational expediency. As part of the approach, we develop an algorithm for collapsing a large set of structural zero combinations into a much smaller set of disjoint marginal conditions, which sp...

...read moreread less

63 citations

Journal Article•DOI•

Greedy Learning of Binary Latent Trees

[...]

Stefan Harmeling¹, Christopher Williams²•Institutions (2)

Max Planck Society¹, University of Edinburgh²

01 Jun 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown that even with restricting ourselves to binary trees, HLC models of comparable quality to Zhang's solutions are obtained, while being generally faster to compute, and it is demonstrated that the methods are able to estimate interpretable latent structures on real-world data with a large number of variables.

...read moreread less

Abstract: Inferring latent structures from observations helps to model and possibly also understand underlying data generating processes. A rich class of latent structures is the latent trees, i.e., tree-structured distributions involving latent variables where the visible variables are leaves. These are also called hierarchical latent class (HLC) models. Zhang and Kocka [CHECK END OF SENTENCE] proposed a search algorithm for learning such models in the spirit of Bayesian network structure learning. While such an approach can find good solutions, it can be computationally expensive. As an alternative, we investigate two greedy procedures: The BIN-G algorithm determines both the structure of the tree and the cardinality of the latent variables in a bottom-up fashion. The BIN-A algorithm first determines the tree structure using agglomerative hierarchical clustering, and then determines the cardinality of the latent variables as for BIN-G. We show that even with restricting ourselves to binary trees, we obtain HLC models of comparable quality to Zhang's solutions (in terms of cross-validated log-likelihood), while being generally faster to compute. This claim is validated by a comprehensive comparison on several data sets. Furthermore, we demonstrate that our methods are able to estimate interpretable latent structures on real-world data with a large number of variables. By applying our method to a restricted version of the 20 newsgroups data, these models turn out to be related to topic models, and on data from the PASCAL Visual Object Classes (VOC) 2007 challenge, we show how such tree-structured models help us understand how objects co-occur in images. For reproducibility of all experiments in this paper, all code and data sets (or links to data) are available at http://people.kyb.tuebingen.mpg.de/harmeling/code/ltt-1.4.tar.

...read moreread less

63 citations

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics