Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Prediction and Semantic Association

[...]

Thomas L. Griffiths¹, Mark Steyvers¹•Institutions (1)

Stanford University¹

01 Jan 2002

TL;DR: It is argued that the success of existing accounts of semantic representation comes as a result of indirectly addressing this problem, and that a closer correspondence to human data can be obtained by taking a probabilistic approach that explicitly models the generative structure of language.

...read moreread less

Abstract: We explore the consequences of viewing semantic association as the result of attempting to predict the concepts likely to arise in a particular context. We argue that the success of existing accounts of semantic representation comes as a result of indirectly addressing this problem, and show that a closer correspondence to human data can be obtained by taking a probabilistic approach that explicitly models the generative structure of language.

...read moreread less

99 citations

Proceedings Article•

Towards better integration of semantic predictors in statistical language modeling.

[...]

Noah Coccaro, Dan Jurafsky

01 Jan 1998

TL;DR: It is shown that modifying the dynamic range, applying a per-word con ﬁ dence metric, and using geometric rather than linear combinations with N-grams produces a more robust language model which has a lower perplexity on a Wall Street Journal test-set than a baseline N- gram model.

...read moreread less

Abstract: We introduce a number of techniques designed to help integrate semantic knowledge with N-gram language models for automatic speech recognition. Our techniques allow us to integrate Latent Semantic Analysis (LSA), a word-similarity algorithm based on word co-occurrence information, with N-gram models. While LSA is good at predicting content words which are coherent with the rest of a text, it is a bad predictor of frequent words, has a low dynamic range, and is inaccurate when combined linearly with N-grams. We show that modifying the dynamic range, applying a per-word confidence metric, and using geometric rather than linear combinations with N-grams produces a more robust language model which has a lower perplexity on a Wall Street Journal testset than a baseline N-gram model.

...read moreread less

99 citations

Proceedings Article•DOI•

Multilayer pLSA for multimodal image retrieval

[...]

Rainer Lienhart¹, Stefan Romberg¹, Eva Hörster¹•Institutions (1)

University of Augsburg¹

08 Jul 2009

TL;DR: It is shown that the best variant of the the proposed mm-pLSA system outperforms the unimodal systems by approximately 19% in the authors' query-by-example task.

...read moreread less

Abstract: It is current state of knowledge that our neocortex consists of six layers [10]. We take this knowledge from neuroscience as an inspiration to extend the standard single-layer probabilistic Latent Semantic Analysis (pLSA) [13] to multiple layers. As multiple layers should naturally handle multiple modalities and a hierarchy of abstractions, we denote this new approach multilayer multimodal probabilistic Latent Semantic Analysis (mm-pLSA). We derive the training and inference rules for the smallest possible non-degenerated mm-pLSA model: a model with two leaf-pLSAs (here from two different data modalities: image tags and visual image features) and a single top-level pLSA node merging the two leaf-pLSAs. From this derivation it is obvious how to extend the learning and inference rules to more modalities and more layers. We also propose a fast and strictly stepwise forward procedure to initialize bottom-up the mm-pLSA model, which in turn can then be post-optimized by the general mm-pLSA learning algorithm. We evaluate the proposed approach experimentally in a query-by-example retrieval task using 50-dimensional topic vectors as image models. We compare various variants of our mm-pLSA system to systems relying solely on visual features or tag features and analyze possible pitfalls of the mm-pLSA training. It is shown that the best variant of the the proposed mm-pLSA system outperforms the unimodal systems by approximately 19% in our query-by-example task.

...read moreread less

98 citations

Posted Content•

Short Text Topic Modeling Techniques, Applications, and Performance: A Survey

[...]

Qiang Jipeng, Qian Zhenyu, Li Yun, Yuan Yunhao, Wu Xindong - Show less +1 more

13 Apr 2019-arXiv: Information Retrieval

TL;DR: This survey conducts a comprehensive review of various short text topic modeling techniques proposed in the literature, and presents three categories of methods based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation, with example of representative approaches in each category and analysis of their performance on various tasks.

...read moreread less

Abstract: Analyzing short texts infers discriminative and coherent latent topics that is a critical and fundamental task since many real-world applications require semantic understanding of short texts. Traditional long text topic modeling algorithms (e.g., PLSA and LDA) based on word co-occurrences cannot solve this problem very well since only very limited word co-occurrence information is available in short texts. Therefore, short text topic modeling has already attracted much attention from the machine learning research community in recent years, which aims at overcoming the problem of sparseness in short texts. In this survey, we conduct a comprehensive review of various short text topic modeling techniques proposed in the literature. We present three categories of methods based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation, with example of representative approaches in each category and analysis of their performance on various tasks. We develop the first comprehensive open-source library, called STTM, for use in Java that integrates all surveyed algorithms within a unified interface, benchmark datasets, to facilitate the expansion of new methods in this research field. Finally, we evaluate these state-of-the-art methods on many real-world datasets and compare their performance against one another and versus long text topic modeling algorithm.

...read moreread less

98 citations

Journal Article•DOI•

Learning and Representing Verbal Meaning The Latent Semantic Analysis Theory

[...]

Thomas K. Landauer¹•Institutions (1)

University of Colorado Boulder¹

01 Oct 1998-Current Directions in Psychological Science

TL;DR: Latent Semantic Analysis (LSA) as mentioned in this paper is a theory of how word meaning is derived from statistics of experience, and of how passage meaning is represented by combinations of words.

...read moreread less

Abstract: Latent semantic analysis (LSA) is a theory of how word meaning—and possibly other knowledge—is derived from statistics of experience, and of how passage meaning is represented by combinations of words. Given a large and representative sample of text, LSA combines the way thousands of words are used in thousands of contexts to map a point for each into a common semantic space. LSA goes beyond pair-wise co-occurrence or correlation to find latent dimensions of meaning that best relate every word and passage to every other. After learning from comparable bodies of text, LSA has scored almost as well as humans on vocabulary and subject-matter tests, accurately simulated many aspects of human judgment and behavior based on verbal meaning, and been successfully applied to measure the coherence and conceptual content of text. The surprising success of LSA has implications for the nature of generalization and language.

...read moreread less

98 citations

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics