Topic
Probabilistic latent semantic analysis
About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A multi-type clustering based recommendation framework which systematically considers the trust-based user clustering, similarity-based users clustering and similarity- based item clustering to further improve the recommendation accuracy is proposed.
27 citations
••
TL;DR: A semantic annotation model is presented which employs continuous PLSA and standard PLSA to model visual features and textual words respectively and can predict semantic annotation precisely for unseen images.
27 citations
01 Oct 1994
TL;DR: Using the proposed merge strategies, LSI is shown to be able to retrieve relevant documents from either language (Greek or English) without requiring any translation of a user's query.
Abstract: In this thesis, a method for indexing cross-language databases for conceptual querymatching is presented. Two languages (Greek and English) are combined by appending a small portion of documents from one language to the identical documents in the other language. The proposed merging strategy duplicates less than 7% of the entire database (made up of di erent translations of the Gospels). Previous strategies duplicated up to 34% of the initial database in order to perform the merger. The proposed method retrieves a larger number of relevant documents for both languages with higher cosine rankings when Latent Semantic Indexing (LSI) is employed. Using the proposed merge strategies, LSI is shown to be e ective in retrieving documents from either language (Greek or English) without requiring any translation of a user's query. An e ective Bible search product needs to allow the use of natural language for searching (queries). LSI enables the user to form queries with using natural expressions in the user's own native language. The merging strategy proposed in this study enables LSI to retrieve relevant documents e ectively while duplicating a minimum of the entire database. iv
27 citations
••
22 Oct 2013TL;DR: A visual vocabulary pruning technique is presented that enormously reduces the amount of required words to describe a medical image dataset with no significant effect on the accuracy.
Abstract: Content--based medical image retrieval has been proposed as a technique that allows not only for easy access to images from the relevant literature and electronic health records but also for training physicians, for research and clinical decision support The bag-of-visual-words approach is a widely used technique that tries to shorten the semantic gap by learning meaningful features from the dataset and describing documents and images in terms of the histogram of these features Visual vocabularies are often redundant, over--complete and noisy Larger than required vocabularies lead to high--dimensional feature spaces, which present important disadvantages with the curse of dimensionality and computational cost being the most obvious ones In this work a visual vocabulary pruning technique is presented It enormously reduces the amount of required words to describe a medical image dataset with no significant effect on the accuracy Results show that a reduction of up to 90% can be achieved without impact on the system performance Obtaining a more compact representation of a document enables multimodal description as well as using classifiers requiring low--dimensional representations
27 citations
••
01 Dec 2015TL;DR: This work builds the work based on Gaussian process latent variable model (GPLVM) to learn the non-linear non-parametric mapping functions and transform heterogeneous data into a shared latent space and proposes multi-modal Similarity Gaussian Process latent Variable model (m-SimGP), which learns the nonlinear mapping functions between the intra- modal similarities and latent representation.
Abstract: Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects. However, relations among heterogeneous modalities are simply treated as observation-to-fit by existing work, and the parameterized cross-modal mapping functions lack flexibility in directly adapting to the content divergence and semantic complicacy of multi-modal data. In this paper, we build our work based on Gaussian process latent variable model (GPLVM) to learn the non-linear non-parametric mapping functions and transform heterogeneous data into a shared latent space. We propose multi-modal Similarity Gaussian Process latent variable model (m-SimGP), which learns the nonlinear mapping functions between the intra-modal similarities and latent representation. We further propose multi-modal regularized similarity GPLVM (m-RSimGP) by encouraging similar/dissimilar points to be similar/dissimilar in the output space. The overall objective functions are solved by simple and scalable gradient decent techniques. The proposed models are robust to content divergence and high-dimensionality in multi-modal representation. They can be applied to various tasks to discover the non-linear correlations and obtain the comparable low-dimensional representation for heterogeneous modalities. On two widely used real-world datasets, we outperform previous approaches for cross-modal content retrieval and cross-modal classification.
27 citations