scispace - formally typeset
Search or ask a question

Showing papers on "Probabilistic latent semantic analysis published in 1990"


Journal ArticleDOI
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Abstract: A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.

12,443 citations


Journal ArticleDOI
TL;DR: A model is proposed that combines the theoret ical strength of the Rasch model with the heuristic power of latent class analysis and gives conditional maximum likelihood estimates of item parameters for each class.
Abstract: A model is proposed that combines the theoret ical strength of the Rasch model with the heuristic power of latent class analysis. It assumes that the Rasch model holds for all persons within a latent class, but it allows for different sets of item parameters between the latent classes. An estima tion algorithm is outlined that gives conditional maximum likelihood estimates of item parameters for each class. No a priori assumption about the item order in the latent classes or the class sizes is required. Application of the model is illustrated, both for simulated data and for real data.

512 citations



Proceedings ArticleDOI
TL;DR: LSI improved prediction performance over keyword matching an average of 13% and showed a 26% improvement in precision over presenting articles in the order received and results indicate that user preferences for articles tend to cluster based on the semantic similarities between articles.
Abstract: Latent Semantic Indexing (LSI) is an information retrieval method that organizes information into a semantic structure that takes advantage of some of the implicit higher-order associations of words with text objects. The resulting structure reflects the major associative patterns in the data while ignoring some of the smaller variations that may be due to idiosyncrasies in the word usage of individual documents. This permits retrieval based on the “latent” semantic content of the documents rather than just on keyword matches. This paper evaluates using LSI for filtering information such as Netnews articles based on a model of user preferences for articles. Users judged articles on how interesting they were and based on these judgements, LSI predicted whether new articles would be judged interesting. LSI improved prediction performance over keyword matching an average of 13% and showed a 26% improvement in precision over presenting articles in the order received. The results indicate that user preferences for articles tend to cluster based on the semantic similarities between articles.

138 citations



01 Jan 1990
TL;DR: This article proposed an extension of the vector retrieval method called "Latent Semantic Indexing" (LSI) to improve users' access to many kinds of textual materials or to objects for which textual descriptions are available.
Abstract: We have previously described an extension of the vector retrieval method called "Latent Semantic Indexing" (LSI) (Deerwester, et al., 1990; Dumais, et al., 1988; Furnas, et al., 1988). The LSI approach partially overcomes the problem of variability in human word choice by automatically organizing objects into a "semantic" structure more appropriate for information retrieval. This is done by modeling the implicit higher-order structure in the association of terms with objects. Initial tests find this completely automatic method to be a promising way to improve users’ access to many kinds of textual materials or to objects for which textual descriptions are available. This paper describes some enhancements to the basic LSI method, including differential term weighting and relevance feedback. Appropriate term weighting improves performance by an average of 40%, and feedback based on 3 relevant documents improves performance by an average of 67%. September 1, 1992 D R A F T

80 citations




Journal ArticleDOI
TL;DR: A model for the analysis of time–budgets using a property that rows of this data matrix add up to one is discussed and compared with logcontrast principal component analysis.
Abstract: Time–budgets summarize how the time of objects is distributed over a number of categories. Usually they are collected in object by category matrices with the property that rows of this data matrix add up to one. In this paper we discuss a model for the analysis of time–budgets that used this property. The model approximates the observed time–budgets by weighted sums of a number of latent time–budgets. These latent time–budgets determine the behavior of all objects. Special attention is given to the identification of the model. The model is compared with logcontrast principal component analysis.

29 citations



Book ChapterDOI
Geert De Soete1
01 Jan 1990
TL;DR: In this paper, a latent class approach was proposed to avoid the a priori grouping of the N subjects into T groups by applying a latent-class approach, which is especially useful when one wants to estimate market segments from the preferential choice data.
Abstract: Several probabilistic models have been proposed for representing two-way and three-way replicated paired comparisons data. Such data are usually obtained by having N subjects judge all n(n−1)/2 possible pairs of n stimuli. If the N subjects are grouped into T (T ≪ N) homogeneous groups, an n × n × T replicated paired comparisons data array is obtained. Examples of models for representing such three-way paired comparisons data are the wandering vector model and the wandering ideal point model. These models predict a different n(n−1)/2-variate Bernoulli distribution for each level t (1 ≤ t ≤ T) of the third way of the data array. By applying a latent class approach, the a priori grouping of the N subjects into T groups can be avoided. This is especially useful when one wants to estimate market segments from the preferential choice data. In the latent class formulation, a different multivariate Bernoulli distribution is predicted for each class and the data of an individual subject are assumed to be sampled from a finite mixture of these multivariate Bernoulli distributions. An EM algorithm for simultaneously estimating the mixture parameters and the choice model parameters is developed. The algorithm as well as a Monte Carlo significance test for the number of latent classes are illustrated on some real data.

Journal ArticleDOI
TL;DR: It is shown that attractor neural networks can be successfully used to model higher-level cognitive phenomena than standard content-addressable pattern recognition, based on the original semantic network models of Collins and Quillian.
Abstract: This paper presents an attractor neural network model of semantic fact retrieval, based on the original semantic network models of Collins and Quillian. In the context of modelling a semantic network, a distinction is made between associations linking together objects belonging to hierarchically related semantic classes, and associations linking together objects and their attributes. Using a distributed representation leads to some generalization properties that have computational advantage. Simulations performed demonstrate that it is feasible to get reasonable response performance regarding various semantic queries, and that the temporal pattern of retrieval times obtained in simulations is consistent with psychological experimental data. Therefore, it is shown that attractor neural networks can be successfully used to model higher-level cognitive phenomena than standard content-addressable pattern recognition.


Journal ArticleDOI
TL;DR: A latent structure analysis for assessing learning structures of acquiring two kinds of skill and a latent class model is proposed for this objective, which explains prerequisite and transfererence relations between the skills.
Abstract: In the present paper, we discuss a latent structure analysis for assessing learning structures of acquiring two kinds of skill. This discussion presents a “pairwise” assessment procedure for explaining the learning structure of acquiring the skills concerned. We propose a latent class model for this objective. This model explains prerequisite and transfererence relations between the skills. A parameter estimation procedure is derived by use of the EM algorithm. A numerical example is also included to illustrate the estimation procedure.


Book ChapterDOI
P. A. Golder1
03 Apr 1990
TL;DR: The nature of knowledge needed to implement a range of bivariate statistical tests is examined and the features of the necessary software to validate these tests in a standard statistical package are described.
Abstract: Various methods have been proposed for making statistical packages more intelligent. One method discussed in detail in this paper is to enrich the description of the data with the relevant semantic knowledge and equip the package to make use of this knowledge. This paper reports on a research project which explored in some details the structure of this metadata and the requirements of the processing modules. In particular the nature of knowledge needed to implement a range of bivariate statistical tests is examined and the features of the necessary software to validate these tests in a standard statistical package are described. The results of a prototype application of the method are also discussed.