scispace - formally typeset
Search or ask a question

Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 1988"


Journal ArticleDOI
01 May 1988
TL;DR: In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries.
Abstract: In a new method for automatic indexing and retrieval, implicit higher-order structure in the association of terms with documents is modeled to improve estimates of term-document association, and therefore the detection of relevant documents on the basis of terms found in queries. Singular-value decomposition is used to decompose a large term by document matrix into 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination; both documents and terms are represented as vectors in a 50- to 150- dimensional space. Queries are represented as pseudo-documents vectors formed from weighted combinations of terms, and documents are ordered by their similarity to the query. Initial tests find this automatic method very promising.

411 citations


Journal ArticleDOI
01 May 1988
TL;DR: A series of experiments were run using the Cranfield test collection to discover techniques to select terms for lists of suggested terms gathered from feedback, nearest neighbors, and term variants of original query terms that would be effective for further retrieval.
Abstract: In an era of online retrieval, it is appropriate to offer guidance to users wishing to improve their initial queries. One form of such guidance could be short lists of suggested terms gathered from feedback, nearest neighbors, and term variants of original query terms. To verify this approach, a series of experiments were run using the Cranfield test collection to discover techniques to select terms for these lists that would be effective for further retrieval. The results show that significant improvement can be expected from this approach to query expansion.

206 citations


Proceedings Article
01 Apr 1988
TL;DR: In this paper, spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets, reminiscent of earlier associative indexing and retrieval systems.
Abstract: Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures are briefly described, and evaluation output is given, reflecting the effectiveness of one of the proposed procedures.

205 citations


Proceedings ArticleDOI
01 May 1988
TL;DR: The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems and is recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets.
Abstract: Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures are briefly described, and evaluation output is given, reflecting the effectiveness of one of the proposed procedures.

109 citations


Proceedings ArticleDOI
01 May 1988
TL;DR: An appropriate indexing approach and the corresponding structure of the AIR/PHYS system are described, and the conditions of the application as well as problems of further development are discussed.
Abstract: Since October 1985, the automatic indexing system AIR/PHYS has been used in the input production of the physics data base of the Fachinformationsentrum Karlsruhe/West Germany. The texts to be indexed are abstracts written in English. The system of descriptors is prescribed. For the application of the AIR/PHYS system a large-scale dictionary containing more than 600 000 word-descriptor relations reap. phrase-descriptor relations has been developed. Most of these relations have been obtained by means of statistical and heuristical methods. In consequence, the relation system is rather imperfect. Therefore, the indexing system needs some fault- tolerating features. An appropriate indexing approach and the corresponding structure of the AIR/PHYS system are described. Finally, the conditions of the application as well as problems of further development are discussed.

83 citations


Proceedings ArticleDOI
01 May 1988
TL;DR: This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm, which has been implemented and applied to two document collections.
Abstract: The importance of a thesaurus in the successful operation of an information retrieval system is well recognized. Yet techniques which support the automatic generation of thesauri remain largely undiscovered. This paper describes one approach to the automatic generation of global thesauri, based on the discrimination value model of Salton, Yang, and Yu and on an appropriate clustering algorithm. This method has been implemented and applied to two document collections. Preliminary results indicate that this method, which produces improvements in retrieval performance in excess of 10 and 15 percent in the test collections, is viable and worthy of continued investigation.

80 citations


Journal ArticleDOI
01 Sep 1988
TL;DR: A theoretical model of a knowledge-based information retrieval system developed in this thesis specifies the requirements and properties, of such a system, and a novel term-similarity function could be defined.
Abstract: Information retrieval can be defined as the extraction of specific information out of a great number of stored information items. Information retrieval systems, used for the retrieval of documents, try to answer more or less precise questions about interesting topics with a number of suitable documents or references to documents. Such systems should contain 'knowledge' about the meaning of questions, about the content of the stored information and the particular user's needs for information.Knowledge-bases systems claim to be able to store knowledge an draw conclusions from it. The goal of this thesis is to investigate the use of knowledge-based methods and technologies for information retrieval. A knowledge-based information retrieval system should represent its Information Structures, as well as knowledge in a common knowledge representation formalism. The retrieval process of the system should employ the inferential methods of the used knowledge representation formalism.A subset of first order logic is chosen for this thesis to represent knowledge. Specially designed retrieval rules represent knowledge for the purpose of retrieval. Retrieval rules capture knowledge about the user's vocabulary, his working domain and his way to perform the retrieval of documents. The problem of recall and precision of the answers of an information retrieval system is approached by an explicit representation of control knowledge.A theoretical model of a knowledge-based information retrieval system developed in this thesis specifies the requirements and properties, of such a system. In particular a novel term-similarity function could be defined. Properties like completeness and termination could be derived and boundaries for the amount of overhead of false control strategies could be investigated.The proposed model is implemented in a prototype of a knowledge-based information retrieval system, called KIR. KIR is a single-user system for personal document- and knowledge retrieval running on computer workstations. It is implemented using Prolog and Modula-2.

68 citations


Proceedings ArticleDOI
01 May 1988
TL;DR: Investigating whether linguistic processes can be used as part of a document retrieval strategy by predefining a level of syntactic analysis of user queries only, suggests that the approach of using linguistic processing in retrieval, is valid.
Abstract: Traditional information has relied on the extensive use of statistical parameters in the implementation of retrieval strategies. This paper sets out to investigate whether linguistic processes can be used as part of a document retrieval strategy. This is done by predefining a level of syntactic analysis of user queries only, to be used as part of the retrieval process. A large series of experiments on an experimental test collection are reported which use a parser for noun phrases as part of the retrieval strategy. The results obtained from the experiments do yield improvements in the level of retrieval effectiveness and given the crude linguistic process used and the way it was used on queries and not on document texts, suggests that the approach of using linguistic processing in retrieval, is valid.

65 citations


Proceedings ArticleDOI
01 May 1988
TL;DR: The experimental results seem to demonstrate that the model provides a useful framework for the design of an adaptive system and a practical procedure to determine the linear decision function.
Abstract: Based on the concept of user preference, we investigate the linear structure in information retrieval. We also discuss a practical procedure to determine the linear decision function and present an analysis of term weighting. Our experimental results seem to demonstrate that our model provides a useful framework for the design of an adaptive system.

60 citations


Journal ArticleDOI
01 May 1988
TL;DR: An information retrieval system which is specifically designed to be used for storing and retrieving information about software components is described, made use of developments in natural language research to represent component information in a form which encodes semantics as well as syntax.
Abstract: This paper describes an information retrieval system which is specifically designed to be used for storing and retrieving information about software components. Rather than use a retrieval mechanism which is simply based on keyword descriptions, we have made use of developments in natural language research to represent component information in a form which encodes semantics as well as syntax. We call this the component descriptor frame. The paper describes the basic ideas which underlie our system and describes how it can be used for component information retrieval. An example of the system in use is presented. The version of the system described here has been fully implemented and is now being developed as part of a more general reuse support system.

59 citations


Proceedings ArticleDOI
01 May 1988
TL;DR: Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type, and the role of links is shown to be especially beneficial.
Abstract: This report considers combining information to improve retrieval. The vector space model has been extended so different classes of data are associated with distinct concept types and their respective subvectors. Two collections with multiple concept types are described, ISI-1460 and CACM-3204. Experiments indicate that regression methods can help predict relevance, given query-document similarity values for each concept type. After sampling and transformation of data, the coefficient of determination for the best model was .48 (.66) for ISI (CACM). Average precision for the two collections was 11% (31%) better for probabilistic feedback with all types versus with terms only. These findings may be of particular interest to designers of document retrieval or hypertext systems since the role of links is shown to be especially beneficial.

Proceedings ArticleDOI
01 May 1988
TL;DR: The implicit base in all information retrieval systems is considered as a logical implication and the measure of correspondence between a document and a query is transformed into the estimation of the strength (or certainty) of logical implication.
Abstract: This paper is a contribution to the construction of a general model for information retrieval. As in the paper of Van Rijsbergen ([RIJ86]), the implicit base in all information retrieval systems is considered as a logical implication. The measure of correspondence between a document and a query is transformed into the estimation of the strength (or certainty) of logical implication. The modal logics will show its suitability for representing the behavior of information retrieval systems. In existing Information Retrieval models, several aspects are often mixed. A part of this paper is contributed to separate these aspects to give a clearer view of information retrieval systems. This general model is also compared with some existing models to show its generality.

Proceedings ArticleDOI
01 May 1988
TL;DR: This paper describes some aspects of a project with the aim of developing a user-friendly interface to a classical Information Retrieval (IR) System in order to improve the effectiveness of retrieval.
Abstract: This paper describes some aspects of a project with the aim of developing a user-friendly interface to a classical Information Retrieval (IR) System in order to improve the effectiveness of retrieval. The character by character approach to IR has been abandoned in favor of an approach based on the meaning of both the queries and the texts containing the information to be sought. The concept space, locally derived from a thesaurus, is used to represent a query as well as documents retrieved in atomic concept units. Dependencies between the search terms are taken into account. The meanings of the query and the retrieved documents (results of Elementary Logical Conjuncts (ELCs)) are compared. The ranking method on the semantical level is used in connection with existing data of a classical IR system. The user enters queries without using complex Boolean expressions.

Proceedings ArticleDOI
01 May 1988
TL;DR: The theory of rough sets, which allows us to classify objects into sets of equivalent members based on their attributes, is introduced and compared to the Boolean, vector and fuzzy models of information retrieval.
Abstract: The theory of rough sets was introduced [PAWLAK82]. It allows us to classify objects into sets of equivalent members based on their attributes. We may then examine any combination of the same objects (or even their attributes) using the resultant classification. The theory has direct applications in the design and evaluation of classification schemes and the selection of discriminating attributes. Pawlak's papers discuss its application in the domain of medical diagnostic systems. Here we apply it to the design of information retrieval systems accessing collections of documents. Advantages offered by the theory are: the implicit inclusion of Boolean logic; term weighting; and the ability to rank retrieved documents. In the first section we describe the theory. This is derived from the work by [PAWLAK84, PAWLAK82] and includes only the most relevant aspects of the theory. In the second we apply it to information retrieval. Specifically, we design the approximation space, search strategies as well as illustrate the application of relevance feedback to improve document indexing. Following this in section three we compare the rough set formalism to the Boolean, vector and fuzzy models of information retrieval. Finally we present a small scale evaluation of rough sets which indicates its potential in information retrieval.

Proceedings ArticleDOI
01 May 1988
TL;DR: Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems, and yield savings of up to 49% relative to the non-compressed file.
Abstract: The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of “coordinates”, each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.

Proceedings ArticleDOI
01 May 1988
TL;DR: This work provides an analysis of areas in which natural language and information retrieval come together, and describes a system that joins the two fields by combining technology, choice of application area, and knowledge acquisition techniques.
Abstract: Neither natural language processing nor information retrieval is any longer a young field, but the two areas have yet to achieve a graceful interaction. Mainly, the reason for this incompatibility is that information retrieval technology depends upon relatively simple but robust methods, while natural language processing involves complex knowledge-based systems that have never approached robustness. We provide an analysis of areas in which natural language and information retrieval come together, and describe a system that joins the two fields by combining technology, choice of application area, and knowledge acquisition techniques.

Proceedings ArticleDOI
01 May 1988
TL;DR: The approach to plausible inference for retrieval is explained and some preliminary experiments designed to test this approach using a spreading activation search to implement the plausible inference process show that significant effectiveness improvements are possible.
Abstract: Choosing an appropriate document representation and search strategy for document retrieval has been largely guided by achieving good average performance instead of optimizing the results for each individual query. A model of retrieval based on plausible inference gives us a different perspective and suggests that techniques should be found for combining multiple sources of evidence (or search strategies) into an overall assessment of a document's relevance, rather than attempting to pick a single strategy. In this paper, we explain our approach to plausible inference for retrieval and describe some preliminary experiments designed to test this approach. The experiments use a spreading activation search to implement the plausible inference process. The results show that significant effectiveness improvements are possible using this approach.

Proceedings ArticleDOI
01 May 1988
TL;DR: A number of experiments are reported on in which a connectionist simulator is used to support similarity-based reasoning in a frame representation to draw some tentative, mixed conclusions on the potential for a union of KR, IR, and connectionism.
Abstract: Knowledge Representation (KR) systems provide support for Artificial Intelligence systems that reason about relationships between objects in their domains of expertise. Because of their support for inference, KR systems appear to have potential to enrich the kind of retrievals that IR systems might make. Ironically, however, the most useful KR systems are limited to reasoning based on a rigid notion of validity, and thus are awkward to use when relevant but inexact retrievals are desired. We have been exploring the potential of a “connectionist” model—the Boltzmann Machine—to overcome this limitation. We report on a number of experiments in which we use a connectionist simulator to support similarity-based reasoning in a frame representation. We draw some tentative, mixed conclusions on the potential for a union of KR, IR, and connectionism.

Journal ArticleDOI
01 May 1988
TL;DR: This paper is in two parts, following the suggestion that I first comment on my own past experience in information retrieval, and then present my views on the present and future.
Abstract: This paper is in two parts, following the suggestion that I first comment on my own past experience in information retrieval, and then present my views on the present and future.

Proceedings ArticleDOI
E. Wilson1
01 May 1988
TL;DR: A prototype information retrieval system for lawyers, Justus, has been developed on a Sun workstation to run in a Guide hypertext environment and incorporates primary legal sources and secondary sources, such as textbooks and a dictionary.
Abstract: A prototype information retrieval system for lawyers, Justus, has been developed on a Sun workstation to run in a Guide hypertext environment. The hypertext database is created automatically by Justus from machine readable versions of the ordinary printed texts, ideally the publisher's typesetting tapes. The database incorporates primary legal sources, such as statutes and cases, and secondary sources, such as textbooks and a dictionary. Initially, the lawyer may select any document in the system. From this initial document, he may access any other document, or part of any other document, to which reference is made. Reference selection is by a pointing device, such as a mouse. There is no limit on the number of selections that can be made, and no restrictions on the path through the system.

Proceedings ArticleDOI
Nick Belkin1
01 May 1988
TL;DR: A general model of clarity in human-computer systems, of which explanation is one component, is proposed, and a model for explanation by the computer intermediary in information retrieval is proposed.
Abstract: We discuss the complexity of explanation activity in human-human goal-directed dialogue, and suggest that this complexity ought to be taken account of in the design of explanation in human-computer interaction. We propose a general model of clarity in human-computer systems, of which explanation is one component. On the bases of: this model; of a model of human-intermediary interaction in the document retrieval situation as one of cooperative model-building for the purpose of developing an appropriate search formulation; and, on the results of empirical observation of human user-human intermediary interaction in information systems, we propose a model for explanation by the computer intermediary in information retrieval.

Proceedings ArticleDOI
01 May 1988
TL;DR: The overall organization of the IR-NLI II system is presented, together with a short description of the two main modules implemented so far, namely the Information Retrieval Expert Subsystem and the User Modeling Subsystem.
Abstract: This paper addresses the problem of building expert interfaces to information retrieval systems. In particular, the problem of augmenting the capabilities of such interfaces with user modeling features is discussed and the main benefits of this approach are outlined. The paper presents a prototype system called IR-NLI II, devoted to model by means of artificial intelligence techniques the human intermediary to information retrieval systems. The overall organization of the IR-NLI II system is presented, together with a short description of the two main modules implemented so far, namely the Information Retrieval Expert Subsystem and the User Modeling Subsystem. An example of interaction with IR-NLI II is described. Perspectives and future research directions are finally outlined.

Proceedings ArticleDOI
01 May 1988
TL;DR: An overview of the ongoing research in the Active Data Bases project at the Vrije Universiteit, Amsterdam is given, which is specifying and building a system that helps a user in his search for useful and interesting information in large, complex information systems.
Abstract: This paper gives an overview of the ongoing research in the Active Data Bases project at the Vrije Universiteit, Amsterdam. In this project we are specifying and building a system that helps a user in his search for useful and interesting information in large, complex information systems. The system is able to do this, because it learns from the interaction about the users and the data it contains. The indications of the users are expressed in terms of interests in the data, which serve as building blocks for user and data models. These models are then used to improve the search for interesting data.

Journal ArticleDOI
01 May 1988
TL;DR: IRX as mentioned in this paper is a text retrieval system designed to be a testbed for conducting information retrieval research on statistically-based retrieval strategies in either batch or interactive modes, and is used at the Johns Hopkins University and the Lister Hill Center providing access to databases in human and molecular genetics.
Abstract: IRX is a text retrieval system designed to be a testbed for conducting information retrieval research on statistically-based retrieval strategies in either batch or interactive modes. The modular structure of IRX has permitted major changes in components of the system (e.g., ranking algorithms, parsers, interfaces; without redesign. As an interactive system IRX is in use at the Johns Hopkins University and the Lister Hill Center providing access to databases in human and molecular genetics.

Proceedings ArticleDOI
01 May 1988
TL;DR: Examination of retrieval system evaluation from the perspective of the user presents evaluation procedures which are appropriate to this perspective and which can be used to isolate the effect of variation in the user interface to the system.
Abstract: Planning the evaluation of an information retrieval system involves two steps: first, a determination of performance descriptors and measures appropriate to the system objectives and, secondly, a development of an evaluation design which ensures the effect of variation in components of interest will be isolated and assessed in an unbiased fashion. This paper examines the question of retrieval system evaluation from the perspective of the user. It presents evaluation procedures which are appropriate to this perspective and which can be used to isolate the effect of variation in the user interface to the system. The general procedure is exemplified by an application to evaluation of an experimental OPAC interface.

Proceedings ArticleDOI
01 May 1988
TL;DR: The basic principles of this approach are presented and compared to more conventional solutions providing only limited extensions and the implementation aspects related to the approach are discussed to show that reasonable performances can be expected.
Abstract: Up to now, most of the retrieving systems are founded on a Boolean selection mechanism. It appears that this way of doing is not powerful enough to deal with some applications, especially when the size (number) of the results must be controlled. In that case, some kind of flexibility is needed in query expression. In this paper, we suggest the use of a fuzzy sets based approach. The basic principles of this approach are presented and compared to more conventional solutions providing only limited extensions. Moreover, the implementation aspects related to our approach are discussed to show that reasonable performances can be expected.

Proceedings ArticleDOI
01 May 1988
TL;DR: Two methods are given to improve weighting schemes by using relevance information of a set of queries to estimate parameter values of two independence models in information retrieval — the binary independence model and the non-binary independence model.
Abstract: Two methods are given to improve weighting schemes by using relevance information of a set of queries. The first method is to estimate parameter values of two independence models in information retrieval — the binary independence model and the non-binary independence model. The parameters estimated here are used to calculate optimal weights for terms in a different set of queries. Performance of this estimation is compared to the inverse document frequency method, the cosine measure, and the statistical similarity measure. The second method is to learn optimal weights of the non-binary independence model adaptively by a learning formula. Experiments are performed on three different document collections CISI, MEDLARS, and CRN4NUL for both methods, and results are reported. Both methods show improvements compared to the existing weighting schemes. Experimental results show that the second method gives slightly better performance than the first one, and has simpler implementation.

Proceedings ArticleDOI
P. Simpson1
01 May 1988
TL;DR: A normal form for expression of queries is defined, it is shown that such queries can be automatically produced, if necessary, from a natural-language request for information, and algorithms for translating such queries into equivalent queries on both Boolean and term-vector type retrieval systems are given.
Abstract: The concept of a large-scale information retrieval network incorporating heterogeneous retrieval systems and users is introduced, and the necessary components for enabling term-based searching of any database by untrained end-users are outlined. We define a normal form for expression of queries, show that such queries can be automatically produced, if necessary, from a natural-language request for information, and give algorithms for translating such queries, with little or no loss of expressiveness, into equivalent queries on both Boolean and term-vector type retrieval systems. We conclude with a proposal for extending this approach to arbitrary database models.

Proceedings ArticleDOI
F. Hirabayashi1, H. Matoba1, Y. Kasahara1
01 May 1988
TL;DR: An image retrieval experiment shows that the proposed internal representation and mapping method and the user interface design provide effective tools for information retrieval.
Abstract: Proposed here is an internal representation and mapping method for multimedia information in which retrieval is based on the impression documents are desired to make. A user interface design for a system using this method is also proposed.The proposed internal representation and mapping method represents each desired document impression as an axis in a semantic space. Documents are represented as points in the space. Queries are represented as subspaces. The proposed user interface design employs a method of visual presentation of the semantic space.For evaluation purposes, a prototype system has been developed. An image retrieval experiment shows that the proposed internal representation and mapping method and the user interface design provide effective tools for information retrieval.

Proceedings ArticleDOI
01 May 1988
TL;DR: A mathematical framework for phonographic correction is proposed by defining a similarity relation between phonetically related substrings and a dissimilarity index between strings and providing a simple and efficient algorithm for recognizing words in dictionaries from misspelt inputs including both typographical and phonographic errors.
Abstract: In this paper, we point out that, in applications available to the general public, and/or natural language interfaces, the correction of phonographic errors (which are competence errors) is far more important than the correction of typographical errors (which are simply performance errors). Many studies aimed at the correction of typographical errors have been carried out, but relatively few tackle the problem of phonographic correction, and they are generally based on more or less ad hoc methods. We propose a mathematical framework for phonographic correction by defining a similarity relation between phonetically related substrings and a dissimilarity index between strings. We also provide a simple and efficient algorithm for recognizing words in dictionaries from misspelt inputs including both typographical and phonographic errors.