scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: A multi-type clustering based recommendation framework which systematically considers the trust-based user clustering, similarity-based users clustering and similarity- based item clustering to further improve the recommendation accuracy is proposed.

27 citations

Journal ArticleDOI
TL;DR: A semantic annotation model is presented which employs continuous PLSA and standard PLSA to model visual features and textual words respectively and can predict semantic annotation precisely for unseen images.

27 citations

01 Oct 1994
TL;DR: Using the proposed merge strategies, LSI is shown to be able to retrieve relevant documents from either language (Greek or English) without requiring any translation of a user's query.
Abstract: In this thesis, a method for indexing cross-language databases for conceptual querymatching is presented. Two languages (Greek and English) are combined by appending a small portion of documents from one language to the identical documents in the other language. The proposed merging strategy duplicates less than 7% of the entire database (made up of di erent translations of the Gospels). Previous strategies duplicated up to 34% of the initial database in order to perform the merger. The proposed method retrieves a larger number of relevant documents for both languages with higher cosine rankings when Latent Semantic Indexing (LSI) is employed. Using the proposed merge strategies, LSI is shown to be e ective in retrieving documents from either language (Greek or English) without requiring any translation of a user's query. An e ective Bible search product needs to allow the use of natural language for searching (queries). LSI enables the user to form queries with using natural expressions in the user's own native language. The merging strategy proposed in this study enables LSI to retrieve relevant documents e ectively while duplicating a minimum of the entire database. iv

27 citations

Proceedings ArticleDOI
22 Oct 2013
TL;DR: A visual vocabulary pruning technique is presented that enormously reduces the amount of required words to describe a medical image dataset with no significant effect on the accuracy.
Abstract: Content--based medical image retrieval has been proposed as a technique that allows not only for easy access to images from the relevant literature and electronic health records but also for training physicians, for research and clinical decision support The bag-of-visual-words approach is a widely used technique that tries to shorten the semantic gap by learning meaningful features from the dataset and describing documents and images in terms of the histogram of these features Visual vocabularies are often redundant, over--complete and noisy Larger than required vocabularies lead to high--dimensional feature spaces, which present important disadvantages with the curse of dimensionality and computational cost being the most obvious ones In this work a visual vocabulary pruning technique is presented It enormously reduces the amount of required words to describe a medical image dataset with no significant effect on the accuracy Results show that a reduction of up to 90% can be achieved without impact on the system performance Obtaining a more compact representation of a document enables multimodal description as well as using classifiers requiring low--dimensional representations

27 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: This work builds the work based on Gaussian process latent variable model (GPLVM) to learn the non-linear non-parametric mapping functions and transform heterogeneous data into a shared latent space and proposes multi-modal Similarity Gaussian Process latent Variable model (m-SimGP), which learns the nonlinear mapping functions between the intra- modal similarities and latent representation.
Abstract: Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects. However, relations among heterogeneous modalities are simply treated as observation-to-fit by existing work, and the parameterized cross-modal mapping functions lack flexibility in directly adapting to the content divergence and semantic complicacy of multi-modal data. In this paper, we build our work based on Gaussian process latent variable model (GPLVM) to learn the non-linear non-parametric mapping functions and transform heterogeneous data into a shared latent space. We propose multi-modal Similarity Gaussian Process latent variable model (m-SimGP), which learns the nonlinear mapping functions between the intra-modal similarities and latent representation. We further propose multi-modal regularized similarity GPLVM (m-RSimGP) by encouraging similar/dissimilar points to be similar/dissimilar in the output space. The overall objective functions are solved by simple and scalable gradient decent techniques. The proposed models are robust to content divergence and high-dimensionality in multi-modal representation. They can be applied to various tasks to discover the non-linear correlations and obtain the comparable low-dimensional representation for heterogeneous modalities. On two widely used real-world datasets, we outperform previous approaches for cross-modal content retrieval and cross-modal classification.

27 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858