scispace - formally typeset
Search or ask a question

Showing papers on "Probabilistic latent semantic analysis published in 1991"


Patent
17 Jul 1991
TL;DR: In this article, a methodology for retrieving textual data objects in a multiplicity of languages is disclosed, where data objects are treated in the statistical domain by presuming that there is an underlying, latent semantic structure in the usage of words in each language under consideration.
Abstract: A methodology for retrieving textual data objects in a multiplicity of languages is disclosed. The data objects are treated in the statistical domain by presuming that there is an underlying, latent semantic structure in the usage of words in each language under consideration. Estimates to this latent structure are utilized to represent and retrieve objects. A user query is recouched in the new statistical domain and then processed in the computer system to extract the underlying meaning to respond to the query.

393 citations


Journal ArticleDOI
TL;DR: A method for investigating whether a set of items satisfies the conditions of a ‘latent scale’ based on an adaptation of the latent class model in which the requirement of ‘double monotony’ is translated into systems of inequalities on the item response probabilities.
Abstract: A method for investigating whether a set of items satisfies the conditions of a ‘latent scale’ is proposed. It is based on an adaptation of the latent class model in which the requirement of ‘double monotony’ is translated into systems of inequalities on the item response probabilities. The problem of obtaining the maximum likelihood estimates of the model parameters is discussed. Finally, this method is applied to real data from a large survey.

54 citations


Journal ArticleDOI
TL;DR: A reparameterization of a latent class model is presented to simultaneously classify and scale nominal and ordered categorical choice data to generate a graphical, multidimensional representation of the classification results.
Abstract: A reparameterization of a latent class model is presented to simultaneously classify and scale nominal and ordered categorical choice data. Latent class-specific probabilities are constrained to be equal to the preference probabilities from a probabilistic ideal-point or vector model that yields a graphical, multidimensional representation of the classification results. In addition, background variables can be incorporated as an aid to interpreting the latent class-specific response probabilities. The analyses of synthetic and real data sets illustrate the proposed method.

39 citations


Proceedings Article
01 Jan 1991
TL;DR: The relational files within the UMLS Metathesaurus contain rich semantic associations to main concepts and the technique of Latent Semantic Indexing was invoked to generate information matrices based on these relationships and created "semantic vectors" using singular value decomposition.
Abstract: The relational files within the UMLS Metathesaurus contain rich semantic associations to main concepts. We invoked the technique of Latent Semantic Indexing to generate information matrices based on these relationships and created "semantic vectors" using singular value decomposition. Evaluations were made on the complete set and subsets of Metathesaurus main concepts with the semantic type "Disease or Syndrome." Real number matrices were created with main concepts, lexical variants, synonyms, and associated expressions. Ancestors, children, siblings, and related terms were added to alternative matrices, preserving the hierarchical direction of the relation as the imaginary component of a complex number. Preliminary evaluation suggests that this technique is robust. A major advantage is the exploitation of semantic features which derive from a statistical decomposition of UMLS structures, possibly reducing dependence on the tedious construction of semantic frames by humans.

34 citations


Journal ArticleDOI
TL;DR: The qualitative characterization of individual performance that is central to modern psychological theory is not adequately modeled by traditional psychometric theory that assumes, among other things, unidimensionality as discussed by the authors.
Abstract: The qualitative characterization of individual performance that is central to modern psychological theory is not adequately modeled by traditional psychometric theory that assumes, among other things, unidimensionality In the present study, data are presented that are more adequately modeled by HYBRID, a model that incorporates both latent trait and latent class components The latent classes were defined by a cognitive analysis of the understanding that individuals have for a circumscribed domain In addition to providing a better statistical fit, the analysis also improves the amount of diagnostic information available for a given individual

5 citations


Proceedings ArticleDOI
Christopher G. Chute1
31 Oct 1991
TL;DR: This work constructed information matrices of complex number values from UMLS entries, to create principal components via singular value decomposition, and preliminary evaluations show the technique to be promising.
Abstract: Natural language is the dominant mode of information representation in clinical practice The National Library of Medicine's Unified Medical Language System (UMLS) provides a structured lexicon, enabling the application of latent semantic analysis for the classification and retrieval of patient diagnoses We constructed information matrices of complex number values from UMLS entries, to create principal components via singular value decomposition Natural language diagnosis entries or inquires can be projected into the resultant Ndimension concept space, and evaluated by cosine deviation from the compressed concept components Our preliminary evaluations show the technique to be promising A major advantage is the avoidance of manually constructed semantic network data schemes; semantic properties derive from statistical decomposition

5 citations


Journal ArticleDOI
TL;DR: To bring quantitative methods to bear in the empiric analysis of clinical episodes, they must be classified into reasonably homogenous categories that sustain inference and generalization.
Abstract: Clinical information is dominated by natural language representation of data and knowledge. To bring quantitative methods to bear in the empiric analysis of clinical episodes, they must be classified into reasonably homogenous categories that sustain inference and generalization. A tangible, if trivial, example of a classification requirement is the retrieval of patient cases relevant to the testing of a clinical hypothesis, so that they can be further scrutinized. Reliance on text word retrieval alone, drawn from natural language summaries, is fraught with contextual ambiguity and defeated by an expressively rich sub-language.

2 citations



Journal ArticleDOI
TL;DR: The present paper deals with not only linear hierarchical structures but also branching hierarchical structures that exist in the population under consideration and proposes a method for evaluating the proportions of latent scales by a technique similar to canonical analysis.
Abstract: In the present paper, scalogram analysis proposed by Guttman (1950) is developed into latent scalogram analysis, and a general discussion of the analysis is presented. The present paper deals with not only linear hierarchical structures but also branching hierarchical structures. In order to select a hierarchical structure which best fits the data set, model selection procedures are considered in both exploratory and confirmatory contexts. Concerning latent scales which exist in the population under consideration, a dynamic interpretation of latent scales is discussed through a mathematical viewpoint, and a method for evaluating the proportions of latent scales is proposed. Moreover in order to compare the latent scales, a latent space for locating the extracted latent classes is constructed by a technique similar to canonical analysis. Numerical examples are also presented to illustrate the present analysis.