scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1981"


01 Jan 1981
TL;DR: This chapter is to set the scene, introducing the basic ideas, sketching in some of the main problem areas, and generally preparing the reader for the more specific or concrete chapters that follow.
Abstract: Information retrieval systems have been the subject of experimental testing for some twenty years now. Like any field in this position, a fair amount of know-how has accumulated about the proper conduct of such investigations. The object of this book is to distil this know-how; the object of this chapter is to set the scene. Thus I will be introducing the basic ideas, sketching in some of the main problem areas, and generally preparing the reader for the more specific or concrete chapters that follow. Van Rijsbergen takes up the question of the evaluation of retrieval effectiveness in Chapter 3; in Chapter 4, Belkin considers information retrieval in a wider context; and in Chapter 5, Tague gets down to the detail of conducting experiments.

105 citations


Journal ArticleDOI
TL;DR: This work looks at the weights from an entirely different approach involving thresholds, and generates an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms.
Abstract: There has been a good deal of work on information retrieval systems that have continuous weights assigned to the index terms that describe the records in the database, and/or to the query terms that describe the user queries. Recent articles have analyzed retrieval systems with continuous weights of either type and/or with a Boolean structure for the queries. They have also suggested criteria which such systems ought to satisfy and record evaluation mechanisms which partially satisfy these criteria. We offer a more careful analysis, based on a generalization of the discrete weights. We also look at the weights from an entirely different approach involving thresholds, and we generate an improved evaluation mechanism which seems to fulfill a larger subset of the desired criteria than previous mechanisms. This new mechanism allows the user to attach a “threshold” to the query term.

102 citations


Journal ArticleDOI
TL;DR: It is shown that the concept of threshold values resolves the problems inherent with relevance weights, and possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations are explored.
Abstract: Several papers have appeared that have analyzed recent developments in the problem of processing, in a document retrieval system, queries expressed as Boolean expressions. The purpose of this paper is to continue that analysis. We shall show that the concept of threshold values resolves the problems inherent with relevance weights. Moreover, we shall explore possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations.

99 citations




Journal ArticleDOI
TL;DR: This article describes how retrieval models which use either independence or dependence assumptions can be extended to include document representatives containing term significance weights and indicates that search strategies based on models modified in this way can further improve the effectiveness of document retrieval systems.
Abstract: Probabilistic models of retrieval have provided insights into the document retrieval process and contain the basis for very effective search strategies. A major limitation of these models is that they assume that documents are represented by binary index terms. In many cases the index terms will be assigned weights, such as within-document frequency weights, which are derived from the content of the documents by the indexing process. These weights, which are referred to here as term significance weights, indicate the relative importance of the terms in individual documents. This article describes how retrieval models which use either independence or dependence assumptions can be extended to include document representatives containing term significance weights. Comparison with other research indicates that search strategies based on models modified in this way can further improve the effectiveness of document retrieval systems.

51 citations


Journal ArticleDOI
TL;DR: The various query-processing methods are discussed and compared and some generalizations of fuzzy-subset theory have been suggested that would allow the user to specify queries with relevance weights or thresholds attached to terms.
Abstract: Most current document retrieval systems require that user queries be specified in the form of Boolean expressions. Although Boolean queries work, they have flaws. Some of the attempts to overcome these flaws have involved “partial-match” retrieval or the use of fuzzy-subset theory. Recently, some generalizations of fuzzy-subset theory have been suggested that would allow the user to specify queries with relevance weights or thresholds attached to terms. The various query-processing methods are discussed and compared.

36 citations


Journal ArticleDOI
TL;DR: The capabilities of the Mistral/11 document retrieval system, which is based on the relational data model and designed for general document retrieval applications, are outlined.

30 citations


Journal ArticleDOI
31 May 1981
TL;DR: The system's objective is to perform the information consultant's job in assisting a user to select the right vocabulary terms for his query, particularly useful for a novice user of a controlled-vocabulary, index-based retrieval system, who is not familiar with the vocabulary and the system Thesaurus.
Abstract: This paper describes a development and implementation of an expert/consultation system for a retrieval data-base, that interfaces between the user and a retrieval system. The system's objective is to perform the information consultant's job in assisting a user to select the right vocabulary terms for his query. It is particularly useful for a novice user of a controlled-vocabulary, index-based retrieval system, who is not familiar with the vocabulary and the system Thesaurus. The user will enter his terms/keywords, that represent his information need, and the system will apply search procedures on its knowledge-base, and will find relevant concepts to be used as query-terms. The system is interactive; it can explain to the user why/how a concept was discovered/suggested, and it can back-track and try to find alternatives in case the user rejects a suggested concept. Two versions of the system were developed, utilizing two search and interaction strategies. Experiments will be conducted with the two alternatives in order to find out user preference and to compare performance. Performance will also be compard with an alternative "conventional" approach, which is an On-Line-Thesarus - developed as part of this study.

27 citations


Dissertation
01 Jan 1981

22 citations


Journal ArticleDOI
31 May 1981
TL;DR: The design of an adaptive document retrieval system that chooses the best search strategy for a particular situation and user is outlined and a general network representation of the documents and terms in the database is proposed.
Abstract: Many effective search strategies derived from different models are available for document retrieval systems. However, it does not appear that there is a single most effective strategy. Instead, different strategies perform optimally under different conditions. This paper outlines the design of an adaptive document retrieval system that chooses the best search strategy for a particular situation and user. In order to be able to support a variety of search strategies, a general network representation of the documents and terms in the database is proposed. This network representation leads to efficient methods of generating and using document and term classifications.One of the most desirable features of an adaptive system would be the ability to learn from experience. A method of incorporating this learning ability into the system is described. The adaptive control strategy for choosing search strategies enables the system to base its actions on a number of factors, including a model of the current user.Finally, some ideas for a flexible interface for casual users are suggested. Part of this interface is the heuristic search, which is used when searches based on formal models have failed. The heuristic search provides a browsing capability for the user.

Journal ArticleDOI
TL;DR: The ideas presented in the paper may prove to be essential for reducing operating costs of information retrieval systems with documents indexed by weighted descriptors and with query search patterns represented by Boolean formulae.

Journal ArticleDOI
TL;DR: A new method of document retrieval is presented on the basis of fundamental fuzzy set theory operations and the notion of a semantic disjunctive normal form, and an algorithm for allocating documents to particular queries is described.
Abstract: A new method of document retrieval is presented on the basis of fundamental fuzzy set theory operations and the notion of semantic disjunctive normal form. Concepts of semantic normal forms are defined, i.e. semantic disjunctive normal form and semantic conjunctive normal form and their elementary properties are presented. Syntax and semantics of the proposed document retrieval language is given and the algorithm of allocating documents to particular queries described. Strategy of document retrieval based on semantic disjunctive normal form is exemplified.



Journal ArticleDOI
TL;DR: This paper demonstrates the property of inclusiveness of document retrieval systems where documents are indexed by unweighted descriptors, and in which query search patterns are Boolean functions of descriptors (systems using the Inverted File Method, the Canonical Structure File Method or the Sequential File Method).
Abstract: One of the means of reducing the information retrieval time is by taking advantage of the property of inclusiveness of information retrieval systems. When one knows the system response to a query which is more general in relation to another query, then in an inclusive retrieval system in order to retrieve the response to the more specific query it suffices to limit the information retrieval process to the search of the system response to the more general query. This paper demonstrates the property of inclusiveness of document retrieval systems where documents are indexed by unweighted descriptors, and in which query search patterns are Boolean functions of descriptors (systems using the Inverted File Method, the Canonical Structure File Method, or the Sequential File Method). The paper presents three methods for determining a partial ordering relation on a set of Boolean search patterns of queries, implying a partial ordering on the set of the system responses to these queries and discusses the adequacy of each method depending on the information retrieval method used. The paper also proves the general theorems concerning the property of inclusiveness with regard to the class of information retrieval systems considered. Furthermore, the methods for calculating the degree of generality/specificity between queries are given. The utilization of the property of inclusiveness may considerably reduce the operating costs of information retrieval systems (particularly systems using the Sequential File Method, e.g. the SDI systems). Moreover, the results of the studies presented in the paper give the possibility of gradual narrowing or broadening in an automatic way of a given Boolean search pattern of a query, which is of vital importance for on-line information retrieval systems.

Journal ArticleDOI
TL;DR: It is shown that if one approaches the computer handling of bibliographic records from the point of view of what a computer is capable of doing, rather than adapting and simulating manual methods, it is possible to dispense with virtually all skilled preparation of formalised and highly structured records.
Abstract: The conventional methods of composing bibliographic records, developed over a long period for manual filing systems, are discussed from the point of view of what can be done in a computer system. The tendency has been to adapt the highly skilled and time-consuming process of composing formalised bibliographic records in bibliographic language by introducing a further degree of formalisation and structure in order to enable the computer to interpret the records in a way which simulates manual searching. It is shown that if one approaches the computer handling of bibliographic records from the point of view of what a computer is capable of doing, rather than adapting and simulating manual methods, it is possible to dispense with virtually all skilled preparation of formalised and highly structured records. A working system on this basis is briefly described. A data-base containing over 40,000 bibliographic references has been compiled by simply transcribing the descriptive data found on the title pages and elsewhere on the documents, using clerical personnel with minimal professional supervision. This data-base has been in regular use for 1 1/2 years as an on-line library catalogue for document retrieval, as well as a retrieval system for subject enquiries. The system is based on the Status-2 software developed by AERE, Harwell. Retrieval from natural language text is not new as a means of interrogating data-bases from the point of view of subject matter. The principle is here extended to cover the purely bibliographic elements as well, and to serve the purpose of library cataloguing. The wider implications of the use of natural language bibliographic descriptions, transcribed without modification, beyond the application to individual libraries, are discussed.


Journal ArticleDOI
31 May 1981
TL;DR: A self-tuning adaptive information retrieval system as an extension of the concept of a "classical" document retrieval system, is outlined, which gives as output an effectiveness value and an efficiency value: both together measure the quality of an information retrieved system.
Abstract: A self-tuning adaptive information retrieval system as an extension of the concept of a "classical" document retrieval system, is outlined. This system accepts documents and search requests in natural language, as well as the system-proposals previously produced by the system itself or prepared by the system operator. It produces a system-proposal that consists of a list of documents ranked according to their relevance to the query.Incorporated into the system is a system valuation subsystem that uses weighted relevance judgements. This subsystem gives as output an effectiveness value and an efficiency value: both together measure the quality of an information retrieval system.The computation of the quality values and the values themselves are independent of a specific implementation. The retrieval process in this system consists of two parts, namely a query-document match and a query-query match.

Journal ArticleDOI
TL;DR: This article surveyed music technical services librarians to determine how these retrieval problems are dealt with in the confines of individual online catalogs, while noting parallels with their earlier study of the same problem from the perspective of online catalog vendors.
Abstract: The symbols for sharp and flat in music notation present special problems for the retrieval of music materials from online catalogs. The authors surveyed music technical services librarians to determine how these retrieval problems are dealt with in the confines of individual online catalogs, while noting parallels with their earlier study of the same problem from the perspective of online catalog vendors. The results of the study revealed that there is much variation in the music symbol retrieval capabilities of individual online systems, and that music technical services librarians are only somewhat satisfied with the manner in which different online systems deal with this problem


Journal ArticleDOI
31 May 1981
TL;DR: The research on which this report is based identifies limitations associated with sequencing rules that use a probability ranking technique and particular attention will be given here to the ranking algorithm appropriate for those presenting the same request, but having different information needs.
Abstract: A document retrieval system should rank documents in order of their usefulness or satisfaction to the users. This principle was first explicated in the classic paper by Maron and Kuhns (1). Additional considerations concerning document ranking have been suggested by other researchers (2,3). Particular attention will be given here to the ranking algorithm appropriate for those presenting the same request, but having different information needs. The research on which this report is based identifies limitations associated with sequencing rules that use a probability ranking technique (4). Three basic and somewhat interdependent limitations will be discussed.

Journal ArticleDOI
TL;DR: A set of experiments was conducted to determine the suitability of the Colon Classification as a foundation for the automated analysis, representation and retrieval of primary information from the full text of documents.
Abstract: A set of experiments was conducted to determine the suitability of the Colon Classification as a foundation for the automated analysis, representation and retrieval of primary information from the full text of documents. Primary information is that information embodied in the text of a document, as opposed to secondary information which is generally in such forms as: an abstract, a table of contents or an index.


01 Jan 1981
TL;DR: When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy.
Abstract: 2. When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy. Unless we meant to delete copyrighted materials that should not have been filmed, you will find a good image of the page in the adjacent frame. If copyrighted materials were deleted you will find a target note listing the pages in the adjacent frame.