Topic
Probabilistic latent semantic analysis
About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: It is proven that there is a direct relationship to the size of the LSA dimension reduction and the L SA self-correlation, and it is shown that by altering theLSA term self-Correlations the authors gain a substantial increase in precision, while also reducing the computation required during the information retrieval process.
Abstract: Latent semantic analysis (LSA) is a generalized vector space method that uses dimension reduction to generate term correlations for use during the information retrieval process. We hypothesized that even though the dimension reduction establishes correlations between terms, the dimension reduction is causing a degradation in the correlation of a term to itself (self-correlation). In this article, we have proven that there is a direct relationship to the size of the LSA dimension reduction and the LSA self-correlation. We have also shown that by altering the LSA term self-correlations we gain a substantial increase in precision, while also reducing the computation required during the information retrieval process.
22 citations
••
09 Aug 2015TL;DR: A Stochastic Gradient Descent based optimization procedure is developed to fit the model by jointly learning the weight of each context and latent factors, which significantly outperforms not only the base model but also the representative context-aware recommendation models.
Abstract: In this paper, we propose a generic framework to learn context-aware latent representations for context-aware collaborative filtering. Contextual contents are combined via a function to produce the context influence factor, which is then combined with each latent factor to derive latent representations. We instantiate the generic framework using biased Matrix Factorization as the base model. A Stochastic Gradient Descent (SGD) based optimization procedure is developed to fit the model by jointly learning the weight of each context and latent factors. Experiments conducted over three real-world datasets demonstrate that our model significantly outperforms not only the base model but also the representative context-aware recommendation models.
22 citations
••
14 May 2006TL;DR: This paper proposes using a "dynamic key term lexicon" automatically extracted from the ever-changing document archives as an extra feature set in the retrieval task, which can retrieve relevant documents more efficiently.
Abstract: Spoken document retrieval will be very important in the future network era. In this paper, we propose using a "dynamic key term lexicon" automatically extracted from the ever-changing document archives as an extra feature set in the retrieval task. This lexicon is much more compact but semantically rich, thus it can retrieve relevant documents more efficiently. The key terms include named entities and others selected by a new metric referred to as the term entropy here derived from probabilistic latent semantic analysis (PLSA). Various configurations of retrieval models were tested with a broadcast news archive in Mandarin Chinese and significant performance improvements were obtained, especially with the new version of PLSA models based on a key term lexicon rather than the full lexicon.
22 citations
••
08 Sep 2008TL;DR: A new measure of semantic relatedness between any pair of terms for the English language, using WordNet as the authors' knowledge base is proposed and a new WSD method based on the proposed measure is introduced.
Abstract: Word sense disambiguation (WSD) methods evolve towards exploring all of the available semantic information that word thesauri provide. In this scope, the use of semantic graphs and new measures of semantic relatedness may offer better WSD solutions. In this paper we propose a new measure of semantic relatedness between any pair of terms for the English language, using WordNet as our knowledge base. Furthermore, we introduce a new WSD method based on the proposed measure. Experimental evaluation of the proposed method in benchmark data shows that our method matches or surpasses state of the art results. Moreover, we evaluate the proposed measure of semantic relatedness in pairs of terms ranked by human subjects. Results reveal that our measure of semantic relatedness produces a ranking that is more similar to the human generated one, compared to rankings generated by other related measures of semantic relatedness proposed in the past.
22 citations
••
TL;DR: In this paper, a multivariate generalized latent variable model is proposed to investigate the effects of observable and latent explanatory variables on multiple responses of interest, such as continuous, count, ordinal, and nominal variables.
Abstract: We consider a multivariate generalized latent variable model to investigate the effects of observable and latent explanatory variables on multiple responses of interest. Various types of correlated responses, such as continuous, count, ordinal, and nominal variables, are considered in the regression. A generalized confirmatory factor analysis model that is capable of managing mixed-type data is proposed to characterize latent variables via correlated observed indicators. In addressing the complicated structure of the proposed model, we introduce continuous underlying measurements to provide a unified model framework for mixed-type data. We develop a multivariate version of the Bayesian adaptive least absolute shrinkage and selection operator procedure, which is implemented with a Markov chain Monte Carlo (MCMC) algorithm in a full Bayesian context, to simultaneously conduct estimation and model selection. The empirical performance of the proposed methodology is demonstrated through a simulation study. An ...
22 citations