Home
/
Topics
/
Probabilistic latent semantic analysis

Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1985
1984
1983
1982
1981
1980
1979
1978
1974
1973
1971
1970
1969
1965
1963
1960
1958
1956
1954

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•

Indexing by latent semantic indexing analysis

[...]

Scott Deerwester, Susan T. Dumais, George W. Furnas, T. K. Landawer, Richard A. Harshman - Show less +1 more

01 Jan 1990-Journal of the Association for Information Science and Technology

100 citations

Proceedings Article•DOI•

Supervised semantic indexing

[...]

Bing Bai¹, Jason Weston¹, David Grangier¹, Ronan Collobert¹, Kunihiko Sadamasa¹, Yanjun Qi¹, Olivier Chapelle², Kilian Q. Weinberger² - Show less +4 more•Institutions (2)

Princeton University¹, Yahoo!²

02 Nov 2009

TL;DR: This article proposes Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match and proposes several improvements to the basic model, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH).

...read moreread less

Abstract: In this article we propose Supervised Semantic Indexing (SSI), an algorithm that is trained on (query, document) pairs of text documents to predict the quality of their match. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained with a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, and correlated feature hashing (CFH). We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain state-of-the-art performance while providing realistically scalable methods.

...read moreread less

100 citations

Linear Discriminant Analysis in Document Classification

[...]

Kari Torkkola¹•Institutions (1)

Motorola¹

01 Jan 2007

TL;DR: This paper points out that LSI ignores discrimination while concentrating on representation, and recommends supervised linear discriminative transforms, and reports good classification results applying these to the Reuters-21578 database.

...read moreread less

Abstract: Document representation using the bag-of-words approach may require bringing the dimensionality of the representation down in order to be able to make effective use of various statistical classification methods. Latent Semantic Indexing (LSI) is one such method that is based on eigendecomposition of the covariance of the document-term matrix. Another often used approach is to select a small number of most important features out of the whole set according to some relevant criterion. This paper points out that LSI ignores discrimination while concentrating on representation. Furthermore, selection methods fail to produce a feature set that jointly optimizes class discrimination. As a remedy, we suggest supervised linear discriminative transforms, and report good classification results applying these to the Reuters-21578 database.

...read moreread less

99 citations

Proceedings Article•

Priors for Diversity in Generative Latent Variable Models

[...]

James T. Kwok¹, Ryan P. Adams¹•Institutions (1)

Harvard University¹

03 Dec 2012

TL;DR: This work revisits independence assumptions for probabilistic latent variable models with a determinantal point process (DPP), leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model.

...read moreread less

Abstract: Probabilistic latent variable models are one of the cornerstones of machine learning. They offer a convenient and coherent way to specify prior distributions over unobserved structure in data, so that these unknown properties can be inferred via posterior inference. Such models are useful for exploratory analysis and visualization, for building density models of data, and for providing features that can be used for later discriminative tasks. A significant limitation of these models, however, is that draws from the prior are often highly redundant due to i.i.d. assumptions on internal parameters. For example, there is no preference in the prior of a mixture model to make components non-overlapping, or in topic model to ensure that co-occurring words only appear in a small number of topics. In this work, we revisit these independence assumptions for probabilistic latent variable models, replacing the underlying i.i.d. prior with a determinantal point process (DPP). The DPP allows us to specify a preference for diversity in our latent variables using a positive definite kernel function. Using a kernel between probability distributions, we are able to define a DPP on probability measures. We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model.

...read moreread less

99 citations

Journal Article•DOI•

Bayesian dynamic modeling of latent trait distributions

[...]

David B. Dunson¹•Institutions (1)

National Institutes of Health¹

01 Oct 2006-Biostatistics

TL;DR: A flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor is proposed.

...read moreread less

Abstract: SUMMARY Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates through a linear regression model. This article proposes a flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor. Scale mixtures of underlying normals are used in order to model flexibly the measurement errors and allow mixed categorical and continuous scales. A dynamic mixture of Dirichlet processes is used to characterize the latent response distributions. Posterior computation proceeds via a Markov chain Monte Carlo algorithm, with predictive densities used as a basis for inferences and evaluation of model fit. The methods are illustrated using data from a study of DNA damage in response to oxidative stress.

...read moreread less

99 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
…
45
46
47
48
49
50
51
…
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics