scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a probabilistic clustering model for mixed data is proposed, which allows analysis of variables of mixed type: the variables may be nominal, ordinal and/or quantitative.
Abstract: This paper develops a probabilistic clustering model for mixeddata. The model allows analysis of variables of mixed type: thevariables may be nominal, ordinal and/or quantitative. The modelcontains the well-known models of latent class analysis as submodels.As in latent class analysis, local independence of the variables isassumed. The parameters of the model are estimated by the EMalgorithm. Test statistics and goodness-of-fit measures are proposedfor model selection. Two artificial data sets show the usefulness ofthese tests. An empirical example completes the presentation.

44 citations

Book ChapterDOI
01 Jan 2010
TL;DR: Integrated Nested Laplace approximation (INLA) is a new approach to implement Bayesian inference for latent Gaussian models which provides approximations of the posterior marginals of the latent variables which are both very accurate and extremely fast to compute.
Abstract: Latent Gaussian models are a common construct in statistical applications where a latent Gaussian field, indirectly observed through data, is used to model, for instance, time and space dependence or the smooth effect of covariates. Many well-known statistical models, such as smoothing-spline models, space time models, semiparametric regression, spatial and spatio-temporal models, log-Gaussian Cox models, and geostatistical models are latent Gaussian models. Integrated Nested Laplace approximation (INLA) is a new approach to implement Bayesian inference for such models. It provides approximations of the posterior marginals of the latent variables which are both very accurate and extremely fast to compute. Moreover, INLA treats latent Gaussian models in a general way, thus allowing for a great deal of automation in the inferential procedure. The inla programme, bundled in the R library INLA, is a prototype of such black-box for inference on latent Gaussian models which is both flexible and user-friendly. It is meant to, hopefully,make latent Gaussian models applicable, useful and appealing for a larger class of users.

44 citations

Journal ArticleDOI
TL;DR: The models used in this article are secondary dimension mixture models with the potential to explain differential item functioning (DIF) between latent classes, called latent DIF.
Abstract: The models used in this article are secondary dimension mixture models with the potential to explain differential item functioning (DIF) between latent classes, called latent DIF. The focus is on models with a secondary dimension that is at the same time specific to the DIF latent class and linked to an item property. A description of the models is provided along with a means of estimating model parameters using easily available software and a description of how the models behave in two applications. One application concerns a test that is sensitive to speededness and the other is based on an arithmetic operations test where the division items show latent DIF.

44 citations

Journal ArticleDOI
01 Jan 2012
TL;DR: An unsupervised learning approach that employs Scale Invariant Feature Transform (SIFT) for extraction of local image features and the probabilistic latent semantic analysis (pLSA) model used in the linguistic content analysis for data clustering is introduced.
Abstract: Since wireless capsule endoscopy (WCE) is a novel technology for recording the videos of the digestive tract of a patient, the problem of segmenting the WCE video of the digestive tract into subvideos corresponding to the entrance, stomach, small intestine, and large intestine regions is not well addressed in the literature. A selected few papers addressing this problem follow supervised leaning approaches that presume availability of a large database of correctly labeled training samples. Considering the difficulties in procuring sizable WCE training data sets needed for achieving high classification accuracy, we introduce in this paper an unsupervised learning approach that employs Scale Invariant Feature Transform (SIFT) for extraction of local image features and the probabilistic latent semantic analysis (pLSA) model used in the linguistic content analysis for data clustering. Results of experimentation indicate that this method compares well in classification accuracy with the state-of-the-art supervised classification approaches to WCE video segmentation.

44 citations

Book ChapterDOI
20 May 2008
TL;DR: A mixture model based on Probabilistic Latent Semantic Analysis (PLSA) is proposed to estimate a hidden semantic theme layer between the terms and the support documents of candidate experts to capture the semantic relevance between the query and the experts.
Abstract: This paper addresses the issue of identifying persons with expertise knowledge on a given topic. Traditional methods usually estimate the relevance between the query and the support documents of candidate experts using, for example, a language model. However, the language model lacks the ability of identifying semantic knowledge, thus results in some right experts cannot be found due to not occurrence of the query terms in the support documents. In this paper, we propose a mixture model based on Probabilistic Latent Semantic Analysis (PLSA) to estimate a hidden semantic theme layer between the terms and the support documents. The hidden themes are used to capture the semantic relevance between the query and the experts. We evaluate our mixture model in a real-world system, ArnetMiner. Experimental results indicate that the proposed model outperforms the language models.

44 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858