scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: An interesting application of SVD to text do cuments is described, where words of similar meaning get mapped to similar low dimensional locations by taking the top k singular values/ vectors.
Abstract: We now described an interesting application of SVD to text do cuments. Suppose we represent documents as a bag of words, soXij is the number of times word j occurs in document i, for j = 1 : W andi = 1 : D, where W is the number of words and D is the number of documents. To find a document that contains a g iven word, we can use standard search procedures, but this can get confuse d by ynonomy (different words with the same meaning) andpolysemy (same word with different meanings). An alternative approa ch is to assume that X was generated by some low dimensional latent representation X̂ ∈ IR, whereK is the number of latent dimensions. If we compare documents in the latent space, we should get improved retrie val performance, because words of similar meaning get mapped to similar low dimensional locations. We can compute a low dimensional representation of X by computing the SVD, and then taking the top k singular values/ vectors: 1

93 citations

Proceedings ArticleDOI
24 Aug 2008
TL;DR: This work proposes a visualization method based on a topic model for discrete data such as documents that can be obtained by fitting the model to a given set of documents using the EM algorithm, resulting in documents with similar topics being embedded close together.
Abstract: We propose a visualization method based on a topic model for discrete data such as documents. Unlike conventional visualization methods based on pairwise distances such as multi-dimensional scaling, we consider a mapping from the visualization space into the space of documents as a generative process of documents. In the model, both documents and topics are assumed to have latent coordinates in a two- or three-dimensional Euclidean space, or visualization space. The topic proportions of a document are determined by the distances between the document and the topics in the visualization space, and each word is drawn from one of the topics according to its topic proportions. A visualization, i.e. latent coordinates of documents, can be obtained by fitting the model to a given set of documents using the EM algorithm, resulting in documents with similar topics being embedded close together. We demonstrate the effectiveness of the proposed model by visualizing document and movie data sets, and quantitatively compare it with conventional visualization methods.

93 citations

Journal ArticleDOI
TL;DR: The use of latent semantic indexing (LSI) in conjunction with normalization and term weighting for content-based image retrieval is examined, using two different approaches to image feature representation.

92 citations

Proceedings ArticleDOI
01 Jun 2015
TL;DR: An unsupervised topic model for short texts that performs soft clustering over distributed representations of words using Gaussian mixture models whose components capture the notion of latent topics and which outperforms LDA on short texts through both subjective and objective evaluation.
Abstract: We present an unsupervised topic model for short texts that performs soft clustering over distributed representations of words. We model the low-dimensional semantic vector space represented by the dense distributed representations of words using Gaussian mixture models (GMMs) whose components capture the notion of latent topics. While conventional topic modeling schemes such as probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation (LDA) need aggregation of short messages to avoid data sparsity in short documents, our framework works on large amounts of raw short texts (billions of words). In contrast with other topic modeling frameworks that use word cooccurrence statistics, our framework uses a vector space model that overcomes the issue of sparse word co-occurrence patterns. We demonstrate that our framework outperforms LDA on short texts through both subjective and objective evaluation. We also show the utility of our framework in learning topics and classifying short texts on Twitter data for English, Spanish, French, Portuguese and Russian.

92 citations

Proceedings Article
04 Nov 2016
TL;DR: PixelVAE as discussed by the authors is a VAE model with an autoregressive decoder based on PixelCNN, which achieves state-of-the-art performance on binarized MNIST, competitive performance on 64x64 ImageNet, and high quality samples on the LSUN bedrooms dataset.
Abstract: Natural image modeling is a landmark challenge of unsupervised learning. Variational Autoencoders (VAEs) learn a useful latent representation and model global structure well but have difficulty capturing small details. PixelCNN models details very well, but lacks a latent code and is difficult to scale for capturing large structures. We present PixelVAE, a VAE model with an autoregressive decoder based on PixelCNN. Our model requires very few expensive autoregressive layers compared to PixelCNN and learns latent codes that are more compressed than a standard VAE while still capturing most non-trivial structure. Finally, we extend our model to a hierarchy of latent variables at different scales. Our model achieves state-of-the-art performance on binarized MNIST, competitive performance on 64x64 ImageNet, and high-quality samples on the LSUN bedrooms dataset.

92 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858