Topic
Probabilistic latent semantic analysis
About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models, and have successfully analyzed the CoIL Challenge 2000 data set using HLC models.
Abstract: The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed variables and no more than 2 latent variables due to the lack of efficient learning algorithms. Significant progress has been made since then on algorithmic research, and it is now possible to learn HLC models with dozens of observed variables. This allows us to demonstrate the benefits of HLC models more convincingly than before. The authors have successfully analyzed the CoIL Challenge 2000 data set using HLC models. The model obtained consists of 22 latent variables, and its structure is intuitively appealing. It is exciting to know that such a large and meaningful latent structure can be automatically inferred from data.
20 citations
••
TL;DR: An extension of latent component analysis to deal with fuzzy data to follow the possibilistic approach, widely used both in the cluster and regression frameworks, and a non-linear programming problem in which the fuzziness of the model is minimized.
20 citations
••
26 Jul 2011TL;DR: A method of text classification based on LDA model is briefly described, which uses LDAmodel as a text representation method, and evaluation parameters in classification system of LDA with SVM are higher than other two methods which are LSI with LSI, and VSM with VSM.
Abstract: This paper introduces three classic models of statistical topic models: Latent Semantic Indexing (LSI), Probabilistic Latent Semantic Indexing (PLSI) and Latent Dirichlet Allocation (LDA). Then a method of text classification based on LDA model is briefly described, which uses LDA model as a text representation method. Each document means a probability distribution of fixed latent topic sets. Next, Support Vector Machine (SVM) is chose as classification algorithm. Finally, the evaluation parameters in classification system of LDA with SVM are higher than other two methods which are LSI with SVM and VSM with SVM, showing a better classification performance.
20 citations
01 Jan 2001
TL;DR: Latent semantic indexing in conjunction with two different ordination techniques is employed to construct a semantic Reuters news wire space and topological information helps to identify the appropriate levels of granularity at which the information space can be visually explored.
Abstract: The geographic concepts of region and scale can be preserved in semantic information spaces and depicted cartographically. Region and scale are fundamental to geographical analysis, and are also associated with cognitive and experiential properties of the real world. Scale is important when graphically representing a spatialization, as it affects the amount of detail that can be shown. Latent semantic indexing in conjunction with two different ordination techniques is employed to construct a semantic Reuters news wire space. Intramax, a hierarchical clustering algorithm, is applied to delineate semantic regions in the Reuters database based on a functional distance measure. This topological information helps to identify the appropriate levels of granularity at which the information space can be visually explored. Amplification of ordination techniques with the Intramax procedure is a useful strategy for creating scale-dependent information spaces that facilitate the exploration of abstract, complex data archives.
19 citations
•
IBM1
TL;DR: In this paper, a method for applying semantic concept detection over multiple related audio-video sources, scoring the sources on the basis of presence or absence of specific semantics and aggregating the scores using combination functions to achieve a semantic super-resolution.
Abstract: An embodiment of the present invention relates to the combining of multiple semantic analyses of audio-visual data in order to resolve a higher fidelity description of the semantic content and more specifically to a method for applying semantic concept detection over multiple related audio-video sources, scoring the sources on the basis of presence or absence of specific semantics and aggregating the scores using combination functions to achieve a semantic super-resolution.
19 citations