scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
Proceedings Article
07 Aug 2011
TL;DR: A methodology for integrating non-parametric tree methods into probabilistic latent variable models by extending functional gradient boosting is presented in the context of occupancy-detection modeling, where the goal is to model the distribution of a species from imperfect detections.
Abstract: Important ecological phenomena are often observed indirectly. Consequently, probabilistic latent variable models provide an important tool, because they can include explicit models of the ecological phenomenon of interest and the process by which it is observed. However, existing latent variable methods rely on hand-formulated parametric models, which are expensive to design and require extensive preprocessing of the data. Nonparametric methods (such as regression trees) automate these decisions and produce highly accurate models. However, existing tree methods learn direct mappings from inputs to outputs—they cannot be applied to latent variable models. This paper describes a methodology for integrating non-parametric tree methods into probabilistic latent variable models by extending functional gradient boosting. The approach is presented in the context of occupancy-detection (OD) modeling, where the goal is to model the distribution of a species from imperfect detections. Experiments on 12 real and 3 synthetic bird species compare standard and tree-boosted OD models (latent variable models) with standard and tree-boosted logistic regression models (without latent structure). All methods perform similarly when predicting the observed variables, but the OD models learn better representations of the latent process. Most importantly, tree-boosted OD models learn the best latent representations when non-linearities and interactions are present.

51 citations

Journal ArticleDOI
TL;DR: A flexible approach to modeling relations in development among two or more discrete, multidimensional latent variables based on the general framework of loglinear modeling with latent variables called associative latent transition analysis (ALTA).
Abstract: To understand one developmental process, it is often helpful to investigate its relations with other developmental processes. Statistical methods that model development in multiple processes simultaneously over time include latent growth curve models with time-varying covariates, multivariate latent growth curve models, and dual trajectory models. These models are designed for growth represented by continuous, unidimensional trajectories. The purpose of this article is to present a flexible approach to modeling relations in development among two or more discrete, multidimensional latent variables based on the general framework of loglinear modeling with latent variables called associative latent transition analysis (ALTA). Focus is given to the substantive interpretation of different associative latent transition models, and exactly what hypotheses are expressed in each model. An empirical demonstration of ALTA is presented to examine the association between the development of alcohol use and sexual risk ...

51 citations

13 Dec 2007
TL;DR: This paper presents Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors and demonstrates that GLSA term vectors efficiently capture semantic relations between terms and outperform related approaches on the synonymy test.
Abstract: Document indexing and representation of termdocument relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by the recent success of co-occurrence based measures of semantic similarity obtained from very large corpora. Our experiments demonstrate that GLSA term vectors efficiently capture semantic relations between terms and outperform related approaches on the synonymy test.

50 citations

Journal ArticleDOI
TL;DR: The proposed two-phase algorithm evaluates the semantic similarity for two or more sentences via a semantic vector space and has outstanding performance in handling long sentences with complex syntax.
Abstract: Research highlights? This research takes advantages of corpus-based ontology and Information Retrieval technologies to evaluate the semantic similarity between irregular sentences. ? The part of speech concept was taken into account and was integrated into the proposed semantic-VSM measure. ? This research tries to qualify the semantic similarity of natural language sentences. A novel sentence similarity measure for semantic based expert systems is presented. The well-known problem in the fields of semantic processing, such as QA systems, is to evaluate the semantic similarity between irregular sentences. This paper takes advantage of corpus-based ontology to overcome this problem. A transformed vector space model is introduced in this article. The proposed two-phase algorithm evaluates the semantic similarity for two or more sentences via a semantic vector space. The first phase built part-of-speech (POS) based subspaces by the raw data, and the latter carried out a cosine evaluation and adopted the WordNet ontology to construct the semantic vectors. Unlike other related researches that focused only on short sentences, our algorithm is applicable to short (4-5 words), medium (8-12 words), and even long sentences (over 12 words). The experiment demonstrates that the proposed algorithm has outstanding performance in handling long sentences with complex syntax. The significance of this research lies in the semantic similarity extraction of sentences, with arbitrary structures.

50 citations

Proceedings ArticleDOI
28 Jun 1992
TL;DR: This article proposed a generalized probabilistic semantic model (GPSM) for preference assignment in natural language processing, which integrates lexical, syntactic and semantic preference under a uniform formulation and showed substantial improvement in structural disambiguation over a syntax-based approach.
Abstract: In natural language processing, ambiguity resolution is a central issue, and can be regarded as a preference assignment problem. In this paper, a Generalized Probabilistic Semantic Model (GPSM) is proposed for preference computation. An effective semantic tagging procedure is proposed for tagging semantic features. A semantic score function is derived based on a score function, which integrates lexical, syntactic and semantic preference under a uniform formulation. The semantic score measure shows substantial improvement in structural disambiguation over a syntax-based approach.

50 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858