scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings ArticleDOI
09 Sep 2010
TL;DR: This paper proposes a two-stage feature selection algorithm based on a kind of feature selection method and latent semantic indexing that constructs a new reduced semantic space between terms based on latent semanticindexing method.
Abstract: Feature selection for text classification is a well-studied problem and the goals are improving classification effectiveness, computational efficiency, or both. In this paper, we propose a two-stage feature selection algorithm based on a kind of feature selection method and latent semantic indexing. Traditional word-matching based text categorization system uses vector space model to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic indexing can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. Because of the too much calculation time of constructing a new semantic space, in this algorithm, firstly we apply a kind of feature selection method to reduce the term dimensions. Secondly, we construct a new reduced semantic space between terms based on latent semantic indexing method. Through some applications involving spam database categorization, we find that our two-stage feature selection method performs better.

15 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new low-level analysis based on latent silhouette cues, particularly suited for low-texture and outdoor datasets is proposed, using an EM framework to simultaneously update a set of volumetric voxel occupancy probabilities and retrieve a best estimate of the dense 3D motion field from the last consecutively observed multi-view frame set.
Abstract: In this paper we investigate shape and motion retrieval in the context of multi-camera systems. We propose a new low-level analysis based on latent silhouette cues, particularly suited for low-texture and outdoor datasets. Our analysis does not rely on explicit surface representations, instead using an EM framework to simultaneously update a set of volumetric voxel occupancy probabilities and retrieve a best estimate of the dense 3D motion field from the last consecutively observed multi-view frame set. As the framework uses only latent, probabilistic silhouette information, the method yields a promising 3D scene analysis method robust to many sources of noise and arbitrary scene objects. It can be used as input for higher level shape modeling and structural inference tasks. We validate the approach and demonstrate its practical use for shape and motion analysis experimentally.

15 citations

Proceedings ArticleDOI
21 Oct 1985
TL;DR: The proof of the lower bound differs fundamentally from all known lower bounds for LSA's or PLSA's, because it does not reduce the problem to a combinatorial one but argues extensively about e.g. a non-discrete measure for similarity of sets in Rn.
Abstract: The "component counting lower bound" known for deterministic linear search algorithms (LSA's) also holds for their probabilistic versions (PLSA's) for many problems, even if two-sided error is allowed, and if one does not charge for probabilistic choice. This implies lower bounds on PLSA's for e.g. the element distinctness problem (n log n) or the knapsack problem (n2). These results yield the first separations between probabilistic and non-deterministic LSA's, because the above problems are non-deterministically much easier. Previous lower bounds for PLSA's either only worked for one-sided error "on the nice side", i.e. on the side where the problems are even non-deterministically hard, or only for probabilistic comparison trees. The proof of the lower bound differs fundamentally from all known lower bounds for LSA's or PLSA's, because it does not reduce the problem to a combinatorial one but argues extensively about e.g. a non-discrete measure for similarity of sets in Rn. This lower bound result solves an open problem posed by Manber and Tompa as well as by Snir. Furthermore, a PLSA for n input variables with two-sided error and expected runtime T can be simulated by a (deterministic) LSA in T2n steps. This proves that the gaps between probabilistic and deterministic LSA's shown by Snir cannot be too large. As this simulation even holds for algebraic computation trees we show that probabilistic and deterministic versions of this model are polynomially related. This is a weaker version of a result due to the author which shows that in case of LSA's, even the non-deterministic and deterministic versions are polynomially related.

15 citations

Journal ArticleDOI
Haiyu Song1, Pengjie Wang1, Jian Yun1, Wei Li1, Bo Xue1, Gang Wu1 
TL;DR: A novel annotation method based on topic model, namely local learning-based probabilistic latent semantic analysis (LL-PLSA) that significantly outperforms the state-of-the-art especially in terms of overall metrics.
Abstract: Automatic image annotation plays a significant role in image understanding, retrieval, classification, and indexing. Today, it is becoming increasingly important in order to annotate large-scale social media images from content-sharing websites and social networks. These social images are usually annotated by user-provided low-quality tags. The topic model is considered as a promising method to describe these weak-labeling images by learning latent representations of training samples. The recent annotation methods based on topic models have two shortcomings. First, they are difficult to scale to a large-scale image dataset. Second, they can not be used to online image repository because of continuous addition of new images and new tags. In this paper, we propose a novel annotation method based on topic model, namely local learning-based probabilistic latent semantic analysis (LL-PLSA), to solve the above problems. The key idea is to train a weighted topic model for a given test image on its semantic neighborhood consisting of a fixed number of semantically and visually similar images. This method can scale to a large-scale image database, as training samples involved in modeling are a few nearest neighbors rather than the entire database. Moreover, this proposed topic model, online customized for the test image, naturally addresses the issue of continuous addition of new images and new tags in a database. Extensive experiments on three benchmark datasets demonstrate that the proposed method significantly outperforms the state-of-the-art especially in terms of overall metrics.

15 citations

Journal ArticleDOI
TL;DR: This article proposes a novel method for image annotation based on combining feature-word distributions, which map from visual space to word space, and word-topic distributions, who form a structure to capture label relationships for annotation.
Abstract: Image annotation is a process of finding appropriate semantic labels for images in order to obtain a more convenient way for indexing and searching images on the Web. This article proposes a novel method for image annotation based on combining feature-word distributions, which map from visual space to word space, and word-topic distributions, which form a structure to capture label relationships for annotation. We refer to this type of model as Feature-Word-Topic models. The introduction of topics allows us to efficiently take word associations, such as locean, fish, coralr or ldesert, sand, cactusr, into account for image annotation. Unlike previous topic-based methods, we do not consider topics as joint distributions of words and visual features, but as distributions of words only. Feature-word distributions are utilized to define weights in computation of topic distributions for annotation. By doing so, topic models in text mining can be applied directly in our method. Our Feature-word-topic model, which exploits Gaussian Mixtures for feature-word distributions, and probabilistic Latent Semantic Analysis (pLSA) for word-topic distributions, shows that our method is able to obtain promising results in image annotation and retrieval.

15 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858