scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings Article
01 Dec 2012
TL;DR: The proposed CLIR framework consists of deep belief networks for each language and the employ canonical correlation analysis to construct a shared semantic space and it is shown that the cross-lingual semantic analysis with DBN and CCA improves the state-of-the-art keyword-based CLIR performance.
Abstract: This paper introduces a cross-language information retrieval (CLIR) framework that combines the state-of-the-art keyword-based approach with a latent semantic-based retrieval model. To capture and analyze the hidden semantics in cross-lingual settings, we construct latent semantic models that map text in different languages into a shared semantic space. Our proposed framework consists of deep belief networks (DBN) for each language and we employ canonical correlation analysis (CCA) to construct a shared semantic space. We evaluated the proposed CLIR approach on a standard ad hoc CLIR dataset, and we show that the cross-lingual semantic analysis with DBN and CCA improves the state-of-the-art keyword-based CLIR performance.

34 citations

Proceedings ArticleDOI
19 Apr 2009
TL;DR: An EM algorithm is developed to learn the PMTF model, showing its equivalence to multiplicative updates derived by an algebraic approach, and the useful behavior of PMTF is demonstrated in a task of document clustering.
Abstract: Nonnegative matrix tri-factorization (NMTF) is a 3-factor decomposition of a nonnegative data matrix, X ≈ USV┬, where factor matrices, U, S, and V , are restricted to be nonnegative as well. Motivated by the aspect model used for dyadic data analysis as well as in probabilistic latent semantic analysis (PLSA), we present a probabilistic model with two dependent latent variables for NMTF, referred to as probabilistic matrix tri-factorization (PMTF). Each latent variable in the model is associated with the cluster variable for the corresponding object in the dyad, leading the model suited to co-clustering. We develop an EM algorithm to learn the PMTF model, showing its equivalence to multiplicative updates derived by an algebraic approach. We demonstrate the useful behavior of PMTF in a task of document clustering. Moreover, we incorporate the likelihood in the PMTF model into existing information criteria so that the number of clusters can be detected, while the algebraic NMTF cannot.

34 citations

Journal ArticleDOI
TL;DR: A novel discriminative joint-feature topic model (DJTM) with dual constraints is proposed to classify multiple abnormalities in WCE images and demonstrates that this method outperforms existing multiple abnormalities classification methods for W CE images.
Abstract: Wireless capsule endoscopy (WCE) enables clinicians to examine the digestive tract without any surgical operations, at the cost of a large amount of images to be analyzed. The main challenge for automatic computer-aided diagnosis arises from the difficulty of robust characterization of these images. To tackle this problem, a novel discriminative joint-feature topic model (DJTM) with dual constraints is proposed to classify multiple abnormalities in WCE images. We first propose a joint-feature probabilistic latent semantic analysis (PLSA) model, where color and texture descriptors extracted from same image patches are jointly modeled with their conditional distributions. Then the proposed dual constraints: visual words importance and local image manifold are embedded into the joint-feature PLSA model simultaneously to obtain discriminative latent semantic topics. The visual word importance is proposed in our DJTM to guarantee that visual words with similar importance come from close latent topics while the local image manifold constraint enforces that images within the same category share similar latent topics. Finally, each image is characterized by distribution of latent semantic topics instead of low level features. Our proposed DJTM showed an excellent overall recognition accuracy 90.78%. Comprehensive comparison results demonstrate that our method outperforms existing multiple abnormalities classification methods for WCE images.

34 citations

Proceedings ArticleDOI
24 Oct 2011
TL;DR: A cross- domain topic indexing (CDTI) method, with which a common semantic space is found from the prior between-domain term correspondences and the term co-occurrences in the cross-domain documents, which shows that CDTI outperforms the state-of-the-art domain adaptation method, and the traditional latent semantic indexing method.
Abstract: Sentiment classification is becoming attractive in recent years because of its potential commercial applications. It exploits supervised learning methods to learn the classifiers from the annotated training documents. The challenge in sentiment classification lies in that the sentiment domains are diverse, heterogeneous and fast-growing. The classifiers trained on one domain (source domain) could not classify a document from another domain (target domain). The domain adaptation technique is to address the problem by making use of labeled samples in the source domain, and unlabeled samples in the target domain. This paper presents a new solution, a cross-domain topic indexing (CDTI) method, with which a common semantic space is found from the prior between-domain term correspondences and the term co-occurrences in the cross-domain documents. These observations are characterized with the mixture model in CDTI, with each component being a possible topic shared by the source and target domains. Such common topics are found to index the cross-domain content. We evaluate the algorithms on a multi-domain sentiment classification task, which shows that CDTI outperforms the state-of-the-art domain adaptation method, i.e. spectral feature alignment (SFA), and the traditional latent semantic indexing method.

34 citations

Journal ArticleDOI
TL;DR: BTM topic model is employed to process short texts–micro-blog data for alleviating the problem of sparsity, and K-means clustering algorithm is integrated into BTM (Biterm Topic Model) for topics discovery further.
Abstract: The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts–micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.

33 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858