Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Hierarchical Latent Class Models for Cluster Analysis

[...]

Nevin L. Zhang

01 Dec 2004-Journal of Machine Learning Research

TL;DR: A search-based algorithm for learning hierarchical latent class models from data using a framework where the local dependence problem can be addressed in a principled manner is developed.

...read moreread less

Abstract: Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a search-based algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and real-world data.

...read moreread less

235 citations

Proceedings Article•DOI•

Knowledge-free induction of morphology using latent semantic analysis

[...]

Patrick Schone¹, Dan Jurafsky¹•Institutions (1)

University of Colorado Boulder¹

13 Sep 2000

TL;DR: A semantics-only algorithm for learning morphology which only proposes affixes when the stem and stem-plus-affix are sufficiently similar semantically and it is shown that this approach provides morphology induction results that rival a current state-of-the-art system.

...read moreread less

Abstract: Morphology induction is a subproblem of important tasks like automatic learning of machine-readable dictionaries and grammar induction. Previous morphology induction approaches have relied solely on statistics of hypothesized stems and affixes to choose which affixes to consider legitimate. Relying on stem-and-affix statistics rather than semantic knowledge leads to a number of problems, such as the inappropriate use of valid affixes ("ally" stemming to "all"). We introduce a semantic-based algorithm for learning morphology which only proposes affixes when the stem and stem-plus-affix are sufficiently similar semantically. We implement our approach using Latent Semantic Analysis and show that our semantics-only approach provides morphology induction results that rival a current state-of-the-art system.

...read moreread less

233 citations

Proceedings Article•DOI•

Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation

[...]

Stacy K. Lukins¹, Nicholas A. Kraft¹, Letha H. Etzkorn¹•Institutions (1)

University of Alabama in Huntsville¹

15 Oct 2008

TL;DR: In this article, the authors present an LDA-based static technique for bug localization based on the latent Dirichlet allocation (LDA) model, which has significant advantages over both LSI and probabilistic LSI.

...read moreread less

Abstract: In bug localization, a developer uses information about a bug to locate the portion of the source code to modify to correct the bug Developers expend considerable effort performing this task Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI); however, latent Dirichlet allocation (LDA), a modular and extensible IR model, has significant advantages over both LSI and probabilistic LSI (pLSI) In this paper we present an LDA-based static technique for automating bug localization We describe the implementation of our technique and three case studies that measure its effectiveness For two of the case studies we directly compare our results to those from similar studies performed using LSI The results demonstrate our LDA-based technique performs at least as well as the LSI-based techniques for all bugs and performs better, often significantly so, than the LSI-based techniques for most bugs

...read moreread less

232 citations

Proceedings Article•DOI•

Learning Motion Categories using both Semantic and Structural Information

[...]

Shu-Fai Wong¹, Tae-Kyun Kim¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

17 Jun 2007

TL;DR: A novel generative model is presented, which extends probabilistic latent semantic analysis (pLSA), to capture both semantic and structural information for motion category recognition, and is shown to be better than existing unsupervised methods in both tasks of motion localisation and recognition.

...read moreread less

Abstract: Current approaches to motion category recognition typically focus on either full spatiotemporal volume analysis (holistic approach) or analysis of the content of spatiotemporal interest points (part-based approach). Holistic approaches tend to be more sensitive to noise e.g. geometric variations, while part-based approaches usually ignore structural dependencies between parts. This paper presents a novel generative model, which extends probabilistic latent semantic analysis (pLSA), to capture both semantic (content of parts) and structural (connection between parts) information for motion category recognition. The structural information learnt can also be used to infer the location of motion for the purpose of motion detection. We test our algorithm on challenging datasets involving human actions, facial expressions and hand gestures and show its performance is better than existing unsupervised methods in both tasks of motion localisation and recognition.

...read moreread less

232 citations

Journal Article•DOI•

Learning Latent Tree Graphical Models

[...]

Myung Jin Choi¹, Vincent Y. F. Tan², Animashree Anandkumar³, Alan S. Willsky¹•Institutions (3)

Massachusetts Institute of Technology¹, University of Wisconsin-Madison², University of California, Irvine³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: In this article, the problem of learning a latent tree graphical model where samples are available only from a subset of variables has been studied and two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes, have been proposed.

...read moreread less

Abstract: We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighbor-joining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world data sets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups data set.

...read moreread less

231 citations

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics