scispace - formally typeset
Search or ask a question

Showing papers on "Probabilistic latent semantic analysis published in 1994"


Proceedings ArticleDOI
01 Aug 1994
TL;DR: This paper applies LSI to the routing task, which operates under the assumption that a sample of relevant and non-relevant documents is available to use in constructing the query, and finds that when LSI is used is conjuction with statistical classification, there is a dramatic improvement in performance.
Abstract: Latent Semantic Indexing (LSI) is a novel approach to information retrieval that attempts to model the underlying structure of term associations by transforming the traditional representation of documents as vectors of weighted term frequencies to a new coordinate space where both documents and terms are represented as linear combinations of underlying semantic factors. In previous research, LSI has produced a small improvement in retrieval performance. In this paper, we apply LSI to the routing task, which operates under the assumption that a sample of relevant and non-relevant documents is available to use in constructing the query. Once again, LSI slightly improves performance. However, when LSI is used is conjuction with statistical classification, there is a dramatic improvement in performance.

197 citations


01 Oct 1994
TL;DR: Using the proposed merge strategies, LSI is shown to be able to retrieve relevant documents from either language (Greek or English) without requiring any translation of a user's query.
Abstract: In this thesis, a method for indexing cross-language databases for conceptual querymatching is presented. Two languages (Greek and English) are combined by appending a small portion of documents from one language to the identical documents in the other language. The proposed merging strategy duplicates less than 7% of the entire database (made up of di erent translations of the Gospels). Previous strategies duplicated up to 34% of the initial database in order to perform the merger. The proposed method retrieves a larger number of relevant documents for both languages with higher cosine rankings when Latent Semantic Indexing (LSI) is employed. Using the proposed merge strategies, LSI is shown to be e ective in retrieving documents from either language (Greek or English) without requiring any translation of a user's query. An e ective Bible search product needs to allow the use of natural language for searching (queries). LSI enables the user to form queries with using natural expressions in the user's own native language. The merging strategy proposed in this study enables LSI to retrieve relevant documents e ectively while duplicating a minimum of the entire database. iv

27 citations




Proceedings ArticleDOI
19 Apr 1994
TL;DR: A new model of speech understanding, based on the cooperation of the speech recognizer and language analyzer, which interacts with the knowledge sources while keeping its modularity is presented, which realizes robust understanding.
Abstract: We present a new model of speech understanding, based on the cooperation of the speech recognizer and language analyzer, which interacts with the knowledge sources while keeping its modularity. The semantic analyzer is realized with a semantic network that represents the possible concepts in a task. The speech recognizer based on an LR parser interacts with the semantic analyzer to eliminate invalid hypotheses at an early stage. The coupling of a loose grammar and interactive semantic analysis accepts ill-formed sentences while filtering out non-sense ones, thus realizes robust understanding. Dialog-level knowledge is also incorporated to constrain both the syntactic and the semantic knowledge sources. The key to guide the search efficiently is powerful heuristics. The relationship between the heuristic power and search efficiency is examined experimentally. The stochastic word bigram is derived from the probabilistic LR grammar as A*-admissible heuristics. >

11 citations


Proceedings ArticleDOI
01 Jan 1994
TL;DR: In this paper, a text representation and searching technique labeled as "Semantic Vector Space Model" (SVSM) is described, which combines Salton's VSM (1991) with distributed representation of semantic case structures of natural language text.
Abstract: This paper describes a text representation and searching technique labeled as "Semantic Vector Space Model" (SVSM). The proposed technique combines Salton's VSM (1991) with distributed representation of semantic case structures of natural language text. It promises a way of abstracting and encoding richer semantic information of natural language text, and therefore, a better precision performance of IR, without involving sophisticated semantic processing. >

8 citations