scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1974"


Book
01 Jan 1974

76 citations


Journal ArticleDOI
TL;DR: A framework for the evaluation of cluster-based retrieval strategies is constructed and these strategies are shown to be dependent on the method of cluster representation (cluster profile) adopted.

38 citations


Journal ArticleDOI
TL;DR: A modified version of the Schiminovich algorithm was used to classify articles in the data base, utilizing citations found in their bibliographies and a “triggering file” of bibliographically related papers, investigating the applicability of an automatic classification technique to information retrieval.

20 citations


Journal ArticleDOI
TL;DR: Some observations about boundary conditions are added to Gosh's study of the CR property, which is a property of a query that can be answered by the retrieval of consecutive records in a file.
Abstract: The consecutive retrieval (CR) property has been defined as the property of a query that can be answered by the retrieval of consecutive records in a file. In this correspondence we add some observations about boundary conditions to Gosh's study of the CR property.

16 citations




Journal ArticleDOI
TL;DR: The aim of this paper is to draw attention to work going on in document retrieval which parallels and in some cases is in advance of the work in data retrieval, aimed at reducing the number of comparisons needed to achieve the desired result.
Abstract: Introduction In a recent paper Burkhard and Keller [I] discuss the best-match p rob lem-the problem "of searching the set of keys in a file to find a key which is closest to a given query." Taking my cue f rom their paper, I present some work which I have done on the same problem in a related field. The aim of this paper is to draw attention to work going on in document retrieval which parallels and in some cases is in advance of the work in data retrieval. In both cases retrieval is based on a file structure imposed on the information, whether keys or documents, aimed at reducing the number of comparisons needed to achieve the desired result. In the case of keys, Burkhard and Keller recommend for their more sophisticated file structure (they recommend several simpler ones) a minimal cover of cliques C such that (1) every key is in at least one element of C, and (2) for no other smaller set C' does (1) hold. Unfortunately finding C requires the generation of (almost) all cliques on the set of keys. It is well known that the computat ion time to generate all cliques can be excessive. The only known bound on this time is so high, order 0(k) n for n keys, that it amounts to no bound at all. The number of cliques in a graph can increase dramatically with the number of nodes in the graph. This in itself has been found to be a hurdle in applications to document retrieval (see e.g. Minker, et al. [6]). So, for applications in document retrieval, where the number of documents to be clustered may be of the order of hundreds of thousands, clique generation is just too slow. Nevertheless, related clustering approaches have Copyright © 1974, Association for Computing Machinery, Inc. General permission to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Department of Information Science, Monash University, Clayton, Victoria, 3168, Australia. been attempted by Salton [8], Litofsky [5], Crouch [2], and Van Rijsbergen [10], who have called in the techniques of cluster analysis to classify the documents so that the search time may be reduced. In both data and document retrieval search time is reduced by selection of a good clique or cluster representative. Burkhard and Keller proposed a method for selecting clique representatives. In document retrieval, one of the cluster representatives selected for use on heuristic grounds in [4] has proved to have an interesting theoretical basis and I describe it here.

9 citations


Journal ArticleDOI
TL;DR: The paper describes the main features of the computer programs as implemented on the IBM 370/165 at Harwell, and includes results of typical search enquiries.

7 citations


Book
01 Jan 1974

6 citations


Journal ArticleDOI
TL;DR: A novel method for the computer storage of an inverted index file that not only offers the possibility of reducing storage requirements for an index but also affords more rapid processing of query statements expressed in Boolean logic.
Abstract: The inverted index file is a frequently used file structure for the storage of indexing information in a document retrieval system. This paper describes a novel method for the computer storage of such an index. The method not only offers the possibility of reducing storage requirements for an index but also affords more rapid processing of query statements expressed in Boolean logic.

5 citations


Journal ArticleDOI
TL;DR: Some elementary mathematical properties of term matching document retrieval systems are developed and are used as a basis for a new file organization technique that allows the use of various matching functions and thresholds.
Abstract: Some elementary mathematical properties of term matching document retrieval systems are developed. These properties are used as a basis for a new file organization technique. Some of the advantages of this new method are (1) the key-to-address transformation is easily determined; (2) the documentary information is stored only once in the file; (3) the file organization allows the use of various matching functions and thresholds; and (4) the dimensionality of the transform is easily expanded to accommodate various sized data bases.

Journal ArticleDOI
01 Feb 1974-Infor
TL;DR: The relative efficiency of search’ is shown to depend on the structure of the information Hie, the form of queries posed to it and the interaction between them.
Abstract: Inverted lists and tree structures are compared in terms of their efficiency as indexes for document retrieval. The relative efficiency of search’ is shown to depend on the structure of the information Hie, the form of queries posed to it and the interaction between them. Formulae are developed for computing the amount of search required in each case and examples are provided.


Book ChapterDOI
01 Jan 1974
TL;DR: The generalized model represents a flexible, comprehensive, modular structure of a retrieval system, as opposed to previous models, which generally emphasize only a specific aspect of the retrieval problem.
Abstract: Several document retrieval systems are analyzed in relation to their relevance to a generalized document retrieval system and in terms of their purposes, strong points, and inadequacies. The generalized model represents a flexible, comprehensive, modular structure of a retrieval system, as opposed to previous models, which generally emphasize only a specific aspect of the retrieval problem.

Journal ArticleDOI
TL;DR: There appears to be a definite relationship between increased browsing capability (browsability) and user acceptance of lower relevance and the effect of this new capability on indexing criteria should be further explored.
Abstract: Based on empirical and anecdotal evidence, there appears to be a definite relationship between increased browsing capability (browsability) and user acceptance of lower relevance. New automated microform retrieval devices are providing near instantaneous retrieval of documents. The effect of this new capability on indexing criteria should be further explored. New indexing philosophies should be developed in order to optimize the total document retrieval system to the user's constraints of time and comprehension.

Journal ArticleDOI
TL;DR: A brief guide is given to some information sources considered to be of most use to anaesthetists that should be available in most university libraries.
Abstract: A brief guide is given to some information sources considered to be of most use to anaesthetists. The majority of the systems described should be available in most university libraries.