scispace - formally typeset
Open AccessProceedings Article

Probabilistic latent semantic analysis

Reads0
Chats0
TLDR
This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.
Abstract
Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Unconstrained Video Monitoring of Breathing Behavior and Application to Diagnosis of Sleep Apnea

TL;DR: A novel motion model to detect subtle, cyclical breathing signals from video, a new 3-D unsupervised self-adaptive breathing template to learn individuals' normal breathing patterns online, and a robust action classification method to recognize abnormal breathing activities and limb movements are introduced.
Journal ArticleDOI

Learning Parts-Based Representations of Data

TL;DR: This work proposes a form of generative latent factor model, in which each data dimension is allowed to select a different factor or part as its explanation for the appearance of a part, and provides the details for two such models: a discrete and a continuous one.
Proceedings ArticleDOI

Enhanced vector space models for content-based recommender systems

TL;DR: Two approaches are introduced: the first, based on a technique called Random Indexing, reduces the impact of two classical VSM problems, this is to say its high dimensionality and the inability to manage the semantics of documents and the second extends the previous one by integrating a negation operator implemented in the Semantic Vectors1 open-source package.
Proceedings Article

A Participant-based Approach for Event Summarization Using Twitter Streams

TL;DR: A participant-based event summarization approach that “zooms-in” the Twitter event streams to the participant level, detects the important sub-events associated with each participant using a novel mixture model that combines the “burstiness” and “cohesiveness” properties of the event tweets, and generates the event summaries progressively.
Proceedings Article

SSHLDA: A Semi-Supervised Hierarchical Topic Model

TL;DR: A semi-supervised hierarchical topic model which aims to explore new topics automatically in the data space while incorporating the information from observed hierarchical labels into the modeling process, called Semi-Supervised Hierarchical Latent Dirichlet Allocation (SSHLDA).
References
More filters
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book

Introduction to Modern Information Retrieval

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Journal ArticleDOI

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Journal ArticleDOI

Probabilistic latent semantic indexing

TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.
Related Papers (5)