scispace - formally typeset
Search or ask a question

Showing papers on "Probabilistic latent semantic analysis published in 1988"


Proceedings ArticleDOI
01 May 1988
TL;DR: Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.
Abstract: This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in the words people use to describe the same object, lexical matching methods are necessarily incomplete and imprecise [5]. The latent semantic indexing approach tries to overcome these problems by automatically organizing text objects into a semantic structure more appropriate for matching user requests. This is done by taking advantage of implicit higher-order structure in the association of terms with text objects. The particular technique used is singular-value decomposition, in which a large term by text-object matrix is decomposed into a set of about 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination. Terms and objects are represented by 50 to 150 dimensional vectors and matched against user queries in this “semantic” space. Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.

638 citations


Patent
15 Sep 1988
TL;DR: In this article, a methodology for retrieving textual data objects is disclosed, where the information is treated in the statistical domain by presuming that there is an underlying, latent semantic structure in the usage of words in the data objects.
Abstract: A methodology for retrieving textual data objects is disclosed. The information is treated in the statistical domain by presuming that there is an underlying, latent semantic structure in the usage of words in the data objects. Estimates to this latent structure are utilized to represent and retrieve objects. A user query is recouched in the new statistical domain and then processed in the computer system to extract the underlying meaning to respond to the query.

536 citations




Journal ArticleDOI
TL;DR: In this paper, a class of probabilistic latent class models with or without response errors and without intrinsically unscalable respondents is described starting from perfectly discriminating nonmonotone dichotomous items.
Abstract: Starting from perfectly discriminating nonmonotone dichotomous items, a class of probabilistic models with or without response errors and with or without intrinsically unscalable respondents is described. All these models can be understood as simply restricted latent class analysis. Thus, the estimation and identifiability of the parameters (class sizes and item latent probabilities) as well as the chi-squared goodness-of-fit tests (Pearson and likelihood-ratio) are free of the problems. The applicability of the proposed variants of latent class models is demonstrated on real attitudinal data.

15 citations


Journal ArticleDOI
TL;DR: The Rasch model is presented as a basic model for representing the relationship of subject and treatment parameters and useful in providing a theoretical framework for specifying dependencies exactly and also as a base for considering more complicated relationships between repeated measures variables.
Abstract: Consideration of within-subject dependencies is a key issue in modelling binary repeated measures medical data. Borrowing from recent developments in sociology and psychology, we demonstrate the applicability of a latent variable approach to the analysis of such data. In particular we present the Rasch model as a basic model for representing the relationship of subject and treatment parameters. The latent variable approach is useful in providing a theoretical framework for specifying dependencies exactly and also as a base for considering more complicated relationships between repeated measures variables.

9 citations


01 Jan 1988
TL;DR: The analysis of such data should clearly depend on the substantive questions posed by the researcher involved, although in many cases these questions will be rather vague as mentioned in this paper, and it will often be left to the statistician to clarify what is meant by such concepts and whether they are present in the investigator's data.
Abstract: Data collected by social and behavioral scientists very often consist of large multidimensional tables of subjects cross-classified according to the values or states of several categorical variables. For example, Table 1 shows a set of data on suicide victims in which the method of committing suicide is cross-classified by sex and age group (Van der Heijden & de Leeuw, 1985) and Table 2 shows counts of subjects resulting from a survey of the political attitudes of a sample from the British electorate (Butler & Stokes, 1974). The analysis of such data should clearly depend on the substantive questions posed by the researcher involved, although in many cases these questions will be rather vague. The research worker may be interested in such notions as “pattern” and “structure” but it will often be left to the statistician to clarify what is meant by such concepts and whether they are present in the investigator’s data. Finally, the statistician has the often difficult task of explaining the results.

5 citations