Probabilistic latent semantic analysis

Open AccessProceedings Article

Probabilistic latent semantic analysis

Thomas Hofmann

- Vol. 15, pp 289-296

Chats0

TLDR

This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.

Abstract:

Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules

Mei-Yuh Hwang, +5 more

- 01 Sep 2009 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A system for highly accurate large-vocabulary Mandarin speech recognition that comprises two sets of acoustic models designed to be complementary in terms of errors but with similar overall accuracy by using different phone sets and different combinations of discriminative learning.

...read moreread less

Proceedings Article

Mining user interests from personal photos

Pengtao Xie, +3 more

TL;DR: A User Image Latent Space Model is proposed to jointly model user interests and image contents and uses variational inference to approximate the posteriors of latent variables and learn model parameters.

...read moreread less

Proceedings Article

Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

Wenye Li, +2 more

TL;DR: A generalized form of representer theorem is investigated for kernel-based regularized learning by utilizing a generalized regularizer which leaves part of the space unregularized.

...read moreread less

Book ChapterDOI

A Semantic Web Pragmatic Approach to Develop Clinical Ontologies, and Thus Semantic Interoperability, Based in HL7 v2.XML Messaging

David Mendes, +1 more

TL;DR: The coverage of HL7 RIM inadequacy for ontology mapping and how to circumvent it, NLP techniques for semi automated ontology population and the current trends about knowledge representation and reasoning are presented and their applicability discussed.

...read moreread less

Proceedings ArticleDOI

Protein Sequence Classification Using Feature Hashing

Cornelia Caragea, +2 more

TL;DR: In this article, the authors compared feature hashing with the bag of k-grams and feature selection approaches and showed that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Book

Introduction to Modern Information Retrieval

Gerard Salton, +1 more

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.

...read moreread less

Journal ArticleDOI

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

Thomas K. Landauer, +1 more

- 01 Apr 1997 -

Psychological Review

TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.

...read moreread less

Journal ArticleDOI

Probabilistic latent semantic indexing

Thomas Hofmann

TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.

...read moreread less