scispace - formally typeset
Search or ask a question
Topic

Probabilistic latent semantic analysis

About: Probabilistic latent semantic analysis is a research topic. Over the lifetime, 2884 publications have been published within this topic receiving 198341 citations. The topic is also known as: PLSA.


Papers
More filters
01 Jan 2009
TL;DR: This model extends previously proposed models such as probabilistic latent semantic analysis (PLSA) by merging both user-item as well as item-tag observations into a unified representation and finds that bringing tags into play reduces the risk of overfitting and increases overall recommendation quality.
Abstract: We investigate the problem of item recommendation during the first months of the collaborative tagging community Ci­ teULike. CiteULike is a so-called folksonomy where users have the possibility to organize publications through anno­ tations tags. Making reliable recommendations during the initial phase of a folksonomy is a difficult task, since infor­ mation about user preferences is meager. In order to im­ prove recommendation results during this cold start period, we present a probabilistic approach to item recommenda­ tion. Our model extends previously proposed models such as probabilistic latent semantic analysis (PLSA) by merging both user-item as well as item-tag observations into a unified representation. We find that bringing tags into play reduces the risk of overfitting and increases overall recommendation quality. Experiments show that our approach outperforms other types of recommenders.

29 citations

Posted Content
TL;DR: A novel and fast traffic sign recognition system, a very important part for advanced driver assistance system and for autonomous driving and a challenge to existing state of the art techniques.
Abstract: In this work we developed a novel and fast traffic sign recognition system, a very important part for advanced driver assistance system and for autonomous driving. Traffic signs play a very vital role in safe driving and avoiding accident. We have used image processing and topic discovery model pLSA to tackle this challenging multiclass classification problem. Our algorithm is consist of two parts, shape classification and sign classification for improved accuracy. For processing and representation of image we have used bag of features model with SIFT local descriptor. Where a visual vocabulary of size 300 words are formed using k-means codebook formation algorithm. We exploited the concept that every image is a collection of visual topics and images having same topics will belong to same category. Our algorithm is tested on German traffic sign recognition benchmark (GTSRB) and gives very promising result near to existing state of the art techniques.

29 citations

Proceedings ArticleDOI
25 Jul 2010
TL;DR: A Discriminative Topic Model (DTM) is proposed that separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving the local consistency.
Abstract: Topic modeling has been popularly used for data analysis in various domains including text documents. Previous topic models, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These models, however, do not take into account the manifold structure of data, which is generally informative for the non-linear dimensionality reduction mapping. More recent models, namely Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown the resulting benefits. But these approaches fall short of the full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this paper, we propose Discriminative Topic Model (DTM) that separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving the local consistency. We also present a novel model fitting algorithm based on the generalized EM and the concept of Pareto improvement. As a result, DTM achieves higher classification performance in a semi-supervised setting by effectively exposing the manifold structure of data. We provide empirical evidence on text corpora to demonstrate the success of DTM in terms of classification accuracy and robustness to parameters compared to state-of-the-art techniques.

29 citations

01 Jan 2006
TL;DR: Investigation of whether LSA cosine values are adequate substitutes for human semantic similarity ratings of word pairs indicates that LSAcosines provide inadequate estimates of similarity ratings.
Abstract: Using Latent Semantic Analysis to Estimate Similarity Sabrina Simmons (s.g.simmons@warwick.ac.uk) Department of Psychology, University of Warwick Coventry CV4 7AL, UK Zachary Estes (z.estes@warwick.ac.uk) Department of Psychology, University of Warwick Coventry CV4 7AL, UK Abstract In three studies we investigated whether LSA cosine values estimate human similarity ratings of word pairs. In study 1 we found that LSA can distinguish between highly similar and dissimilar matches to a target word, but that it does not reliably distinguish between highly similar and less similar matches. In study 2 we showed that, using an expanded item set, the correlation between LSA ratings and human similarity ratings is both quite low and inconsistent. Study 3 demonstrates that, while people distinguish between taxonomic / thematic word pairs, LSA cosines do not. Although people rate taxonomically related items to be more similar than thematically related items, LSA cosine values are equivalent across stimuli types. Our results indicate that LSA cosines provide inadequate estimates of similarity ratings. Latent Semantic Analysis (LSA) is a statistical model of language learning and representation that uses vector averages in semantic space to assess the similarity between words or texts (Landauer & Dumais, 1997). LSA has been used to model a number of cognitive phenomena and correlates well with many human behaviors relating to language use. Owing to the model’s success, LSA ratings are now being used in place of human ratings as measures of semantic similarity. The purpose of this paper is to investigate whether LSA cosine values are adequate substitutes for human semantic similarity ratings. recording the frequency that a given word occurs in particular texts, the model weights entries to reflect the diagnosticity of a word for a given context. For example, a word that appears in a large number of very different contexts is not as diagnostic as a word that occurs less often and only in a small set of similar contexts. The next steps in the process are essential to LSA’s ability to uncover higher order associations between words. Through a combination of singular value decomposition (SVD) and dimension reduction, the representations of words that occur in the same or similar contexts become themselves, more similar (Kwantes, 2005). In this case, a word’s representation is a vector in semantic space that summarizes information about the contexts in which that word is found. In this way the similarity between two words can be determined by the cosine between their vectors (although other methods are sometimes used, see Rehder, Schreiner, Wolfe, Laham, Landauer, & Kintsch, 1998). Thus, through a process that contains no information about semantic features, the definition of words, word order, or parts of speech, LSA is able to capture subtle relationships between words that might never have occurred together (Landauer & Dumais, 1997). LSA in Application Overview of LSA LSA is able to model several human cognitive abilities and has a number of potential practical applications. To give a few examples, LSA can imitate the vocabulary growth rate of a school age child (Landauer & Dumais, 1997), is able to recognize synonyms about as accurately as prospective college students who speak English as a second language (Landauer & Dumais 1994), can pass a college level multiple choice test in psychology (Landauer, Foltz, & Laham, 1998), can successfully simulate semantic priming (Landauer & Dumais, 1997), assesses essay quality in a manner consistent with human graders (Landauer, et. al. 1998), and is able to determine what text a student should use in order to optimize learning (Wolfe, Schreiner, Rehder, Laham, Foltz, Kintsch & Landauer, 1998). Because of its success at modeling human performance on such a range of semantic tasks, LSA is sometimes used as a tool for stimuli norming and construction. For example, The basic theory behind LSA is that the “psychological similarity between any two words is reflected in the way they co-occur in small sub-samples of language” (Landauer & Dumais, 1997 p. 215). The model begins with a matrix taking words as rows and contexts as columns, although, theoretically, many other types of information could also be used. The contexts may be anything, for example, newspaper articles, textbooks, or student essays and the words are simply those that appear in the training set. Importantly, the contexts with which the model is provided will determine what types of words it has experience with, so the training set should be relevant to the task the model is to perform. The first step is to associate each word with the contexts in which it is likely to appear. In addition to

29 citations

Proceedings ArticleDOI
26 May 2009
TL;DR: This paper proposes a hybrid recommender system that utilizes latent features from items represented by a multi-attributed record using a probabilistic model and calculates the similarity of users from their ratings.
Abstract: This paper proposes a hybrid recommender system that utilizes latent features. The main problem discussed in this paper is the cold start problem. To handle this problem, the proposed system first extracts latent features from items represented by a multi-attributed record using a probabilistic model. Then, it calculates the similarity of users from their ratings. Both similarities between items and users are used for predicting unknown rating of a user to a item. We evaluate the proposed method using a movie data set and shows that the proposed method achieves good performance for small ratings information.

29 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
84% related
Feature (computer vision)
128.2K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
84% related
Deep learning
79.8K papers, 2.1M citations
83% related
Object detection
46.1K papers, 1.3M citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202319
202277
202114
202036
201927
201858