Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Latent variable perceptron algorithm for structured classification

[...]

Xu Sun¹, Takuya Matsuzaki¹, Daisuke Okanohara¹, Jun'ichi Tsujii²•Institutions (2)

University of Tokyo¹, University of Manchester²

11 Jul 2009

TL;DR: Compared to existing probabilistic models of latent variables, the proposed perceptron-style algorithm lowers the training cost significantly yet with comparable or even superior classification accuracy.

...read moreread less

Abstract: We propose a perceptron-style algorithm for fast discriminative training of structured latent variable model, and analyzed its convergence properties. Our method extends the perceptron algorithm for the learning task with latent dependencies, which may not be captured by traditional models. It relies on Viterbi decoding over latent variables, combined with simple additive updates. Compared to existing probabilistic models of latent variables, our method lowers the training cost significantly yet with comparable or even superior classification accuracy.

...read moreread less

60 citations

Journal Article•DOI•

Mixture of latent trait analyzers for model-based clustering of categorical data

[...]

Isabella Gollini¹, Thomas Brendan Murphy²•Institutions (2)

Maynooth University¹, University College Dublin²

01 Jul 2014-Statistics and Computing

TL;DR: A variational approach for fitting the mixture of latent trait models is developed and it is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.

...read moreread less

Abstract: Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.

...read moreread less

60 citations

Journal Article•DOI•

Medical Image Retrieval: A Multimodal Approach.

[...]

Yu Cao¹, Shawn Steffey¹, Jianbiao He², Degui Xiao³, Cui Tao⁴, Ping Chen⁵, Henning Müller⁶ - Show less +3 more•Institutions (6)

University of Massachusetts Lowell¹, Central South University², Hunan University³, University of Texas Health Science Center at Houston⁴, University of Massachusetts Boston⁵, University of Applied Sciences Western Switzerland⁶

01 Jan 2014-Cancer Informatics

TL;DR: A new multimodal medical image retrieval approach based on the recent advances in the statistical graphic model and deep learning is developed, which first investigates a new extended probabilistic Latent Semantic Analysis model to integrate the visual and textual information from medical images to bridge the semantic gap.

...read moreread less

Abstract: Medical imaging is becoming a vital component of war on cancer. Tremendous amounts of medical image data are captured and recorded in a digital format during cancer care and cancer research. Facing such an unprecedented volume of image data with heterogeneous image modalities, it is necessary to develop effective and efficient content-based medical image retrieval systems for cancer clinical practice and research. While substantial progress has been made in different areas of content-based image retrieval (CBIR) research, direct applications of existing CBIR techniques to the medical images produced unsatisfactory results, because of the unique characteristics of medical images. In this paper, we develop a new multimodal medical image retrieval approach based on the recent advances in the statistical graphic model and deep learning. Specifically, we first investigate a new extended probabilistic Latent Semantic Analysis model to integrate the visual and textual information from medical images to bridge the semantic gap. We then develop a new deep Boltzmann machine-based multimodal learning model to learn the joint density model from multimodal information in order to derive the missing modality. Experimental results with large volume of real-world medical images have shown that our new approach is a promising solution for the next-generation medical imaging indexing and retrieval system.

...read moreread less

60 citations

Journal Article•DOI•

How useful are corpus-based methods for extrapolating psycholinguistic variables?

[...]

Paweł Mandera¹, Emmanuel Keuleers¹, Marc Brysbaert¹•Institutions (1)

Ghent University¹

19 Feb 2015-Quarterly Journal of Experimental Psychology

TL;DR: A systematic comparison of two extrapolation techniques: k-nearest neighbours, and random forest, in combination with semantic spaces built using latent semantic analysis, topic model, a hyperspace analogue to language (HAL)-like model, and a skip-gram model finds that at least some of the extrapolation methods may introduce artefacts to the data and produce results that could lead to different conclusions that would be reached based on the human ratings.

...read moreread less

Abstract: Subjective ratings for age of acquisition, concreteness, affective valence, and many other variables are an important element of psycholinguistic research. However, even for well-studied languages, ratings usually cover just a small part of the vocabulary. A possible solution involves using corpora to build a semantic similarity space and to apply machine learning techniques to extrapolate existing ratings to previously unrated words. We conduct a systematic comparison of two extrapolation techniques: k-nearest neighbours, and random forest, in combination with semantic spaces built using latent semantic analysis, topic model, a hyperspace analogue to language (HAL)-like model, and a skip-gram model. A variant of the k-nearest neighbours method used with skip-gram word vectors gives the most accurate predictions but the random forest method has an advantage of being able to easily incorporate additional predictors. We evaluate the usefulness of the methods by exploring how much of the human performance in...

...read moreread less

59 citations

Journal Article•DOI•

A probabilistic model for Latent Semantic Indexing

[...]

Chris Ding¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Apr 2005-Journal of the Association for Information Science and Technology

TL;DR: A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of LSI, indicating that LSI dimensions represent latent concepts.

...read moreread less

Abstract: Latent Semantic Indexing (LSI), when applied to semantic space built on text collections, improves information retrieval, information filtering, and word sense disambiguation. A new dual probability model based on the similarity concepts is introduced to provide deeper understanding of LSI. Semantic associations can be quantitatively characterized by their statistical significance, the likelihood. Semantic dimensions containing redundant and noisy information can be separated out and should be ignored because their negative contribution to the overall statistical significance. LSI is the optimal solution of the model. The peak in the likelihood curve indicates the existence of an intrinsic semantic dimension. The importance of LSI dimensions follows the Zipf-distribution, indicating that LSI dimensions represent latent concepts. Document frequency of words follows the Zipf distribution, and the number of distinct words follows log-normal distribution. Experiments on five standard document collections confirm and illustrate the analysis.

...read moreread less

59 citations

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics