scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: Latent class analysis (LCA) is an extremely useful and flexible technique for the analysis of categorical data, measured at the nominal, ordinal, or interval level as discussed by the authors.
Abstract: Latent class analysis (LCA) is an extremely useful and flexible technique for the analysis of categorical data, measured at the nominal, ordinal, or interval level (the latter with fixed or estimat...

47 citations

Journal ArticleDOI
TL;DR: A novel probabilistic technique, time-delay gaussian-process factor analysis (TD-GPFA), that performs dimensionality reduction in the presence of a different time delay between each pair of latent and observed variables is introduced.
Abstract: Noisy, high-dimensional time series observations can often be described by a set of low-dimensional latent variables. Commonly used methods to extract these latent variables typically assume instantaneous relationships between the latent and observed variables. In many physical systems, changes in the latent variables manifest as changes in the observed variables after time delays. Techniques that do not account for these delays can recover a larger number of latent variables than are present in the system, thereby making the latent representation more difficult to interpret. In this work, we introduce a novel probabilistic technique, time-delay gaussian-process factor analysis TD-GPFA, that performs dimensionality reduction in the presence of a different time delay between each pair of latent and observed variables. We demonstrate how using a gaussian process to model the evolution of each latent variable allows us to tractably learn these delays over a continuous domain. Additionally, we show how TD-GPFA combines temporal smoothing and dimensionality reduction into a common probabilistic framework. We present an expectation/conditional maximization either ECME algorithm to learn the model parameters. Our simulations demonstrate that when time delays are present, TD-GPFA is able to correctly identify these delays and recover the latent space. We then applied TD-GPFA to the activity of tens of neurons recorded simultaneously in the macaque motor cortex during a reaching task. TD-GPFA is able to better describe the neural activity using a more parsimonious latent space than GPFA, a method that has been used to interpret motor cortex data but does not account for time delays. More broadly, TD-GPFA can help to unravel the mechanisms underlying high-dimensional time series data by taking into account physical delays in the system.

47 citations

Proceedings ArticleDOI
29 Sep 2014
TL;DR: A novel latent discriminative model for human activity recognition that outperforms the state-of-the-art approach by over 5% in both precision and recall, while the model is more efficient in computation.
Abstract: We present a novel latent discriminative model for human activity recognition. Unlike the approaches that require conditional independence assumptions, our model is very flexible in encoding the full connectivity among observations, latent states, and activity states. The model is able to capture richer class of contextual information in both state-state and observation-state pairs. Although loops are present in the model, we can consider the graphical model as a linear-chain structure, where the exact inference is tractable. Thereby the model is very efficient in both inference and learning. The parameters of the graphical model are learned with the Structured-Support Vector Machine (Structured-SVM). A data-driven approach is used to initialize the latent variables, thereby no hand labeling for the latent states is required. Experimental results on the CAD-120 benchmark dataset show that our model outperforms the state-of-the-art approach by over 5% in both precision and recall, while our model is more efficient in computation.

47 citations

Book ChapterDOI
14 Apr 2003
TL;DR: It is shown that link information can be useful when the document collection has a sufficiently high link density and links are of sufficiently high quality, but is detrimental to performance at low link densities or if the quality of the links is degraded.
Abstract: Link analysis methods have become popular for information access tasks, especially information retrieval, where the link information in a document collection is used to complement the traditionally used content information. However, there has been little firm evidence to confirm the utility of link information. We show that link information can be useful when the document collection has a sufficiently high link density and links are of sufficiently high quality. We report experiments on text classification of the Cora and WebKB data sets using Probabilistic Latent Semantic Analysis and Probabilistic Hypertext Induced Topic Selection. Comparison with manually assigned classes shows that link information enhances classification in data with sufficiently high link density, but is detrimental to performance at low link densities or if the quality of the links is degraded. We introduce a new frequency-based method for selecting the most useful citations from a document collection for use in the model.

47 citations

Proceedings ArticleDOI
Shengbo Guo1, Scott Sanner1
19 Jul 2010
TL;DR: This novel derivation presents a formal probabilistic latent view of MMR (PLMMR) that removes the need to manually balance relevance and diversity parameters, and formally derives variants of latent semantic indexing (LSI) similarity metrics for use in PLMMR.
Abstract: Diversity has been heavily motivated in the information retrieval literature as an objective criterion for result sets in search and recommender systems. Perhaps one of the most well-known and most used algorithms for result set diversification is that of Maximal Marginal Relevance (MMR). In this paper, we show that while MMR is somewhat ad-hoc and motivated from a purely pragmatic perspective, we can derive a more principled variant via probabilistic inference in a latent variable graphical model. This novel derivation presents a formal probabilistic latent view of MMR (PLMMR) that (a) removes the need to manually balance relevance and diversity parameters, (b) shows that specific definitions of relevance and diversity metrics appropriate to MMR emerge naturally, and (c) formally derives variants of latent semantic indexing (LSI) similarity metrics for use in PLMMR. Empirically, PLMMR outperforms MMR with standard term frequency based similarity and diversity metrics since PLMMR maximizes latent diversity in the results.

47 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858