scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Journal ArticleDOI
TL;DR: The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models, and have successfully analyzed the CoIL Challenge 2000 data set using HLC models.
Abstract: The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed variables and no more than 2 latent variables due to the lack of efficient learning algorithms. Significant progress has been made since then on algorithmic research, and it is now possible to learn HLC models with dozens of observed variables. This allows us to demonstrate the benefits of HLC models more convincingly than before. The authors have successfully analyzed the CoIL Challenge 2000 data set using HLC models. The model obtained consists of 22 latent variables, and its structure is intuitively appealing. It is exciting to know that such a large and meaningful latent structure can be automatically inferred from data.

20 citations

Journal ArticleDOI
TL;DR: An extension of latent component analysis to deal with fuzzy data to follow the possibilistic approach, widely used both in the cluster and regression frameworks, and a non-linear programming problem in which the fuzziness of the model is minimized.

20 citations

Proceedings ArticleDOI
26 Jul 2011
TL;DR: A method of text classification based on LDA model is briefly described, which uses LDAmodel as a text representation method, and evaluation parameters in classification system of LDA with SVM are higher than other two methods which are LSI with LSI, and VSM with VSM.
Abstract: This paper introduces three classic models of statistical topic models: Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Indexing (PLSI) and Latent Dirichlet Allocation (LDA). Then a method of text classification based on LDA model is briefly described, which uses LDA model as a text representation method. Each document means a probability distribution of fixed latent topic sets. Next, Support Vector Machine (SVM) is chose as classification algorithm. Finally, the evaluation parameters in classification system of LDA with SVM are higher than other two methods which are LSI with SVM and VSM with SVM, showing a better classification performance.

20 citations

01 Jan 2001
TL;DR: Latent semantic indexing in conjunction with two different ordination techniques is employed to construct a semantic Reuters news wire space and topological information helps to identify the appropriate levels of granularity at which the information space can be visually explored.
Abstract: The geographic concepts of region and scale can be preserved in semantic information spaces and depicted cartographically. Region and scale are fundamental to geographical analysis, and are also associated with cognitive and experiential properties of the real world. Scale is important when graphically representing a spatialization, as it affects the amount of detail that can be shown. Latent semantic indexing in conjunction with two different ordination techniques is employed to construct a semantic Reuters news wire space. Intramax, a hierarchical clustering algorithm, is applied to delineate semantic regions in the Reuters database based on a functional distance measure. This topological information helps to identify the appropriate levels of granularity at which the information space can be visually explored. Amplification of ordination techniques with the Intramax procedure is a useful strategy for creating scale-dependent information spaces that facilitate the exploration of abstract, complex data archives.

19 citations

Patent
Milind Naphade1, John R. Smith1
03 Jan 2007
TL;DR: In this paper, a method for applying semantic concept detection over multiple related audio-video sources, scoring the sources on the basis of presence or absence of specific semantics and aggregating the scores using combination functions to achieve a semantic super-resolution.
Abstract: An embodiment of the present invention relates to the combining of multiple semantic analyses of audio-visual data in order to resolve a higher fidelity description of the semantic content and more specifically to a method for applying semantic concept detection over multiple related audio-video sources, scoring the sources on the basis of presence or absence of specific semantics and aggregating the scores using combination functions to achieve a semantic super-resolution.

19 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858