Home
/
Topics
/
Probabilistic latent semantic analysis

Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1985
1984
1983
1982
1981
1980
1979
1978
1974
1973
1971
1970
1969
1965
1963
1960
1958
1956
1954

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Shallow semantics for relation extraction

[...]

Sanda M. Harabagiu¹, Cosmin Adrian Bejan¹, Paul Morarescu¹•Institutions (1)

University of Texas at Dallas¹

30 Jul 2005

TL;DR: A new method for extracting meaningful relations from unstructured natural language sources based on information made available by shallow semantic parsers that surpassed the results of kernel-based models employing only semantic class information.

...read moreread less

Abstract: This paper presents a new method for extracting meaningful relations from unstructured natural language sources. The method is based on information made available by shallow semantic parsers. Semantic information was used (1) to enhance a dependency tree kernel; and (2) to build semantic dependency structures used for enhanced relation extraction for several semantic classifiers. In our experiments the quality of the extracted relations surpassed the results of kernel-based models employing only semantic class information.

...read moreread less

63 citations

Journal Article•DOI•

phishGILLNET—phishing detection methodology using probabilistic latent semantic analysis, AdaBoost, and co-training

[...]

Venkatesh Ramanathan¹, Harry Wechsler¹•Institutions (1)

George Mason University¹

26 Mar 2012-Eurasip Journal on Information Security

TL;DR: A robust server side methodology to detect phishing attacks, called phishGILLNET, which incorporates the power of natural language processing and machine learning techniques, and outperforms state of the art phishing detection methods.

...read moreread less

Abstract: Identity theft is one of the most profitable crimes committed by felons. In the cyber space, this is commonly achieved using phishing. We propose here robust server side methodology to detect phishing attacks, called phishGILLNET, which incorporates the power of natural language processing and machine learning techniques. phishGILLNET is a multi-layered approach to detect phishing attacks. The first layer (phishGILLNET1) employs Probabilistic Latent Semantic Analysis (PLSA) to build a topic model. The topic model handles synonym (multiple words with similar meaning), polysemy (words with multiple meanings), and other linguistic variations found in phishing. Intentional misspelled words found in phishing are handled using Levenshtein editing and Google APIs for correction. Based on term document frequency matrix as input PLSA finds phishing and non-phishing topics using tempered expectation maximization. The performance of phishGILLNET1 is evaluated using PLSA fold in technique and the classification is achieved using Fisher similarity. The second layer of phishGILLNET (phishGILLNET2) employs AdaBoost to build a robust classifier. Using probability distributions of the best PLSA topics as features the classifier is built using AdaBoost. The third layer (phishGILLNET3) further expands phishGILLNET2 by building a classifier from labeled and unlabeled examples by employing Co-Training. Experiments were conducted using one of the largest public corpus of email data containing 400,000 emails. Results show that phishGILLNET3 outperforms state of the art phishing detection methods and achieves F-measure of 100%. Moreover, phishGILLNET3 requires only a small percentage (10%) of data be annotated thus saving significant time, labor, and avoiding errors incurred in human annotation.

...read moreread less

63 citations

Book•

Statistics and neural networks: advances at the interface

[...]

Jim Kay, D. M. Titterington

06 Apr 2000

TL;DR: Flexible discriminant and mixture models Neural networks for unsupervised learning based on information theory Radial basis function networks and statistics Robust prediction in many-parameter models and data visualisation.

...read moreread less

Abstract: Flexible discriminant and mixture models Neural networks for unsupervised learning based on information theory Radial basis function networks and statistics Robust prediction in many-parameter models Density networks Latent variable models and data visualisation Analysis of latent structure models with multidimensional latent variables Artificial neural networks and multivariate statistics

...read moreread less

62 citations

Proceedings Article•DOI•

FLSA: Extending Latent Semantic Analysis with Features for Dialogue Act Classification

[...]

Riccardo Serafin, Barbara Di Eugenio¹•Institutions (1)

University of Illinois at Chicago¹

21 Jul 2004

TL;DR: This paper applied Feature Latent Semantic Analysis (FLSA) to dialogue act classification with excellent results on three corpora: CallHome Spanish, MapTask, and their own corpus of tutoring dialogues.

...read moreread less

Abstract: We discuss Feature Latent Semantic Analysis (FLSA), an extension to Latent Semantic Analysis (LSA). LSA is a statistical method that is ordinarily trained on words only; FLSA adds to LSA the richness of the many other linguistic features that a corpus may be labeled with. We applied FLSA to dialogue act classification with excellent results. We report results on three corpora: CallHome Spanish, MapTask, and our own corpus of tutoring dialogues.

...read moreread less

62 citations

Book Chapter•DOI•

[...]

Vasile Rus¹, Nobal B. Niraula¹, Rajendra Banjade¹•Institutions (1)

University of Memphis¹

24 Mar 2013

TL;DR: The investigation on semantic similarity measures at word- and sentence-level based on two fully-automated approaches to deriving meaning from large corpora: Latent Dirichlet Allocation, a probabilistic approach, and Latent Semantic Analysis, an algebraic approach are presented.

...read moreread less

Abstract: We present in this paper the results of our investigation on semantic similarity measures at word- and sentence-level based on two fully-automated approaches to deriving meaning from large corpora: Latent Dirichlet Allocation, a probabilistic approach, and Latent Semantic Analysis, an algebraic approach. The focus is on similarity measures based on Latent Dirichlet Allocation, due to its novelty aspects, while the Latent Semantic Analysis measures are used for comparison purposes. We explore two types of measures based on Latent Dirichlet Allocation: measures based on distances between probability distribution that can be applied directly to larger texts such as sentences and a word-to-word similarity measure that is then expanded to work at sentence-level. We present results using paraphrase identification data in the Microsoft Research Paraphrase corpus.

...read moreread less

62 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
…
69
70
71
72
73
74
75
…
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics