Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Learning semantic visual vocabularies using diffusion distance

[...]

Jingen Liu¹, Yang Yang¹, Mubarak Shah¹•Institutions (1)

University of Central Florida¹

20 Jun 2009

TL;DR: A novel approach to automatically learn a semantic visual vocabulary from abundant quantized midlevel features by using diffusion maps to capture the local intrinsic geometric relations between the midlevel feature points on the manifold.

...read moreread less

Abstract: In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this midlevel feature space, we believe the features produced by similar sources must lie on a certain manifold. To capture the intrinsic geometric relations between features, we measure their dissimilarity using diffusion distance. The underlying idea is to embed the midlevel features into a semantic lower-dimensional space. Our goal is to construct a compact yet discriminative semantic visual vocabulary. Although the conventional approach using k-means is good for vocabulary construction, its performance is sensitive to the size of the visual vocabulary. In addition, the learnt visual words are not semantically meaningful since the clustering criterion is based on appearance similarity only. Our proposed approach can effectively overcome these problems by capturing the semantic and geometric relations of the feature space using diffusion maps. Unlike some of the supervised vocabulary construction approaches, and the unsupervised methods such as pLSA and LDA, diffusion maps can capture the local intrinsic geometric relations between the midlevel feature points on the manifold. We have tested our approach on the KTH action dataset, our own YouTube action dataset and the fifteen scene dataset, and have obtained very promising results.

...read moreread less

173 citations

Proceedings Article•DOI•

Modeling hidden topics on document manifold

[...]

Deng Cai¹, Qiaozhu Mei¹, Jiawei Han¹, ChengXiang Zhai¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

26 Oct 2008

TL;DR: This paper proposes a novel algorithm called Laplacian Probabilistic Latent Semantic Indexing (LapPLSI) for topic modeling, which models the document space as a submanifold embedded in the ambient space and directly performs the topic modeling on this document manifold in question.

...read moreread less

Abstract: Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the joint probability of documents and terms in the corpus. The major disadvantage of PLSI is that it estimates the probability distribution of each document on the hidden topics independently and the number of parameters in the model grows linearly with the size of the corpus, which leads to serious problems with overfitting. Latent Dirichlet Allocation (LDA) is proposed to overcome this problem by treating the probability distribution of each document over topics as a hidden random variable. Both of these two methods discover the hidden topics in the Euclidean space. However, there is no convincing evidence that the document space is Euclidean, or flat. Therefore, it is more natural and reasonable to assume that the document space is a manifold, either linear or nonlinear. In this paper, we consider the problem of topic modeling on intrinsic document manifold. Specifically, we propose a novel algorithm called Laplacian Probabilistic Latent Semantic Indexing (LapPLSI) for topic modeling. LapPLSI models the document space as a submanifold embedded in the ambient space and directly performs the topic modeling on this document manifold in question. We compare the proposed LapPLSI approach with PLSI and LDA on three text data sets. Experimental results show that LapPLSI provides better representation in the sense of semantic structure.

...read moreread less

173 citations

Proceedings Article•DOI•

Efficient topic-based unsupervised name disambiguation

[...]

Yang Song¹, Jian Huang¹, Isaac G. Councill¹, Jia Li¹, C. Lee Giles¹ - Show less +1 more•Institutions (1)

Pennsylvania State University¹

18 Jun 2007

TL;DR: This paper presents an efficient and effective two-stage approach to disambiguate person names within web pages and scientific documents and empirically addressed the issue of scalability bydisambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

...read moreread less

Abstract: Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or even share the same name with other people. In this paper, we focus on the problem of disambiguating person names within web pages and scientific documents. We present an efficient and effective two-stage approach to disambiguate names. In the first stage, two novel topic-based models are proposed by extending two hierarchical Bayesian text models, namely Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. After learning an initial model, the topic distributions are treated as feature sets and names are disambiguated by leveraging a hierarchical agglomerative clustering method. Experiments on web data and scientific documents from CiteSeer indicate that our approach consistently outperforms other unsupervised learning methods such as spectral clustering and DBSCAN clustering and could be extended to other research fields. We empirically addressed the issue of scalability by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.

...read moreread less

172 citations

Proceedings Article•

Using latent semantic analysis to improve information retrieval

[...]

Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Scott Deerwester

01 Jan 1988

171 citations

Journal Article•DOI•

Multilabel Image Classification With Regional Latent Semantic Dependencies

[...]

Junjie Zhang¹, Qi Wu, Chunhua Shen, Jian Zhang², Jianfeng Lu¹ - Show less +1 more•Institutions (2)

Nanjing University of Science and Technology¹, University of Technology, Sydney²

09 Mar 2018-IEEE Transactions on Multimedia

TL;DR: Zhang et al. as discussed by the authors proposed a regional latent semantic dependencies model (RLSD) to localize the regions that may contain multiple highly dependent labels, and the localized regions are further sent to the recurrent neural networks to characterize the label dependencies at the regional level.

...read moreread less

Abstract: Deep convolution neural networks (CNNs) have demonstrated advanced performance on single-label image classification, and various progress also has been made to apply CNN methods on multilabel image classification, which requires annotating objects, attributes, scene categories, etc., in a single shot. Recent state-of-the-art approaches to the multilabel image classification exploit the label dependencies in an image, at the global level, largely improving the labeling capacity. However, predicting small objects and visual concepts is still challenging due to the limited discrimination of the global visual features. In this paper, we propose a regional latent semantic dependencies model (RLSD) to address this problem. The utilized model includes a fully convolutional localization architecture to localize the regions that may contain multiple highly dependent labels. The localized regions are further sent to the recurrent neural networks to characterize the latent semantic dependencies at the regional level. Experimental results on several benchmark datasets show that our proposed model achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images. Also, we set up an upper bound model (RLSD+ft-RPN) using bounding-box coordinates during training, and the experimental results also show that our RLSD can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.

...read moreread less

169 citations

Collapse

Network Information

Performance

Metrics

2,984

Papers

212,744

Citations

No. of papers in the topic in previous years
Year	Papers
2023	19
2022	77
2021	14
2020	36
2019	27
2018	58

Probabilistic latent semantic analysis

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics