Open AccessProceedings Article
Probabilistic latent semantic analysis
Thomas Hofmann
- Vol. 15, pp 289-296
Reads0
Chats0
TLDR
This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.Abstract:
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.read more
Citations
More filters
Journal ArticleDOI
Comparing grounded theory and topic modeling: Extreme divergence or unlikely convergence?
TL;DR: This article describes the application of analysis methods from two distinct fields, one method from interpretive social science andOne method from statistical machine learning, to the same survey data, and suggests ways that such methods might be combined in novel and compelling ways.
Journal ArticleDOI
Short Text Classification: A Survey
TL;DR: The characters of short text and the difficulty of shortText classification are discussed, and the existing popular works on short text classifiers and models, including short text classification using sematic analysis, semi-supervised short text classified, ensemble short text Classification, and real-time classification are introduced.
Journal ArticleDOI
Music Information Retrieval Using Social Tags and Audio
Mark Levy,Mark Sandler +1 more
TL;DR: A novel approach to applying text-based information retrieval techniques to music collections that represents tracks with a joint vocabulary consisting of both conventional words, drawn from social tags, and audio muswords, representing characteristics of automatically-identified regions of interest within the signal.
Proceedings ArticleDOI
Emerging topic detection using dictionary learning
TL;DR: This work addresses the problem of identifying emerging topics through the use of dictionary learning by proposing a two stage approach respectively based on detection and clustering of novel user-generated content and derives a scalable approach by using the alternating directions method to solve the resulting optimization problems.
Journal ArticleDOI
The thematic and citation landscape of Data and Knowledge Engineering (1985-2007)
TL;DR: The thematic and citation structures of Data and Knowledge Engineering (DKE) (1985-2007) are identified based on text analysis and citation analysis of the bibliographic records of full papers published in the journal.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Journal ArticleDOI
Probabilistic latent semantic indexing
TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.