Open AccessProceedings Article
Probabilistic latent semantic analysis
Thomas Hofmann
- Vol. 15, pp 289-296
Reads0
Chats0
TLDR
This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.Abstract:
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.read more
Citations
More filters
Journal ArticleDOI
LIMTopic: A Framework of Incorporating Link Based Importance into Topic Modeling
TL;DR: A framework LIMTopic to incorporate link based importance into topic modeling is proposed and a novel measure called log-likelihood of ranking-integrated document-word matrix is proposed to investigate the document network summarization performance of topic models.
Patent
Learning word embedding using morphological knowledge
Bin Gao,Tie-Yan Liu +1 more
TL;DR: In this paper, a machine learning system may use morphological knowledge to enhance a deep learning framework for learning word embedding, and the system may consider morphological similarities between and among words in a learning process so as to handle new or rare words.
Proceedings Article
Hidden Markov tree models for semantic class induction
TL;DR: This paper proposes a generative model of sentences, based on dependency trees and which takes into account homonymy, and describes an efficient algorithm to perform inference and learning in this model.
Journal ArticleDOI
Engineering Deep Representations for Modeling Aesthetic Perception
TL;DR: Zhang et al. as discussed by the authors used a bag-of-words representation of the frequent Flickr image tags to discover a compact set of textual attributes (i.e., each textual attribute is a sparse and linear representation of those frequent image tags) for each Flickr image.
Proceedings Article
Authorless Topic Models: Biasing Models Away from Known Structure
Laure Thompson,David Mimno +1 more
TL;DR: This work finds that it can predict which words cause topic-metadata correlation and that by selectively subsampling these words the authors dramatically reduce topic- metadata correlation, improve topic stability, and maintain or even improve model quality.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book
Introduction to Modern Information Retrieval
Gerard Salton,Michael J. McGill +1 more
TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Journal ArticleDOI
Probabilistic latent semantic indexing
TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.