Open AccessProceedings Article
Probabilistic latent semantic analysis
Thomas Hofmann
- Vol. 15, pp 289-296
Reads0
Chats0
TLDR
This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.Abstract:
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.read more
Citations
More filters
Proceedings ArticleDOI
A comparative evaluation of data-driven models in translation selection of machine translation
TL;DR: A comparative evaluation of two data-driven models used in translation selection of English-Korean machine translation using k-nearest neighbor (k-NN) learning to select an appropriate translation of the unseen instances in the dictionary.
Journal ArticleDOI
Affinity Regularized Non-Negative Matrix Factorization for Lifelong Topic Modeling
TL;DR: A novel lifelong topic model based on non-negative matrix factorization ( NMF), called Affinity Regularized NMF for LTM (NMF-LTM), which to the best knowledge is distinctive from the popular LDA-based LTMs.
Proceedings ArticleDOI
Detection of sign-language content in video through polar motion profiles
TL;DR: The approach uses an ensemble of Haar-based face detectors to define regions of interest (ROI), and a background model to segment movements in the ROI and achieves 81% precision and 94% recall on a dataset of user-contributed YouTube videos.
Patent
System and method for image annotation and multi-modal image retrieval using probabilistic semantic models comprising at least one joint probability distribution
Ruofei Zhang,Zhongfei Zhang +1 more
TL;DR: In this article, an Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class.
Pachinko Allocation: Scalable Mixture Models of Topic Correlations
Wei Li,Andrew McCallum +1 more
TL;DR: This paper proposes the pachinko allocation model (PAM), which captures arbitrary topic correlations using a directed acyclic graph (DAG), and develops a highly-scalable inference algorithm for PAM.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Journal ArticleDOI
Probabilistic latent semantic indexing
TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.